B-lineage acute lymphoblastic leukemia (B-ALL) is the most common type of childhood cancer and mostly derives from immature B cells that carry the cell surface antigen CD19 (ref. 1). Most children with B-ALL can be cured through first-line treatment comprising combinations of cytotoxic agents. However, relapsed ALL remains a leading cause of childhood death despite intensive cytotoxic chemotherapy often including allogeneic bone marrow transplantation. The advent of CD19 chimeric antigen receptor (CAR)-T cell therapy in recent years has transformed the treatment of intractable ALL2. Although a subset of children can be cured, up to 60% of children experience further, typically fatal, disease recurrence due to non-persistence of CAR-T cells or CD19 leukemic escape3,4.

Previously, we generated a novel low-affinity CAR incorporating a CD19-specific single-chain variable fragment (scFv) called CAT, displaying a faster off-rate of interaction than the FMC63 CD19 binder used in prior clinical studies3. CAT CAR-T cells showed greater cytotoxicity and proliferative responses in vitro and maintained long-lived molecular remissions in children with relapsed or refractory ALL, as demonstrated in the CARPALL study3. The molecular features underpinning CAR-T cell persistence in our study remain unknown. We reasoned that single-cell transcriptomic assays may help elucidate these features. To date, other CAR-T cell products in patients have been studied at the resolution of single cells5,6. However, the persistence of CAR T-cells in these studies was generally limited to 3 months. An exception was long-lived CAR-T cells in two adult individuals with a different cancer—chronic lymphocytic leukemia (CLL)—in whom anti-CD19 CAR-T cells have persisted for almost a decade thus far7. It is unclear whether one can generalize from two adult patients treated for CLL to other hematological malignancies and patient groups, in particular to childhood ALL, or to other CAR-T cell products.

We systematically studied molecular features and clonal dynamics of CAR-T cells in children enrolled in the CARPALL study at serial timepoints, from production to persistence, up to 5 years after infusion.


Overview of study cohort and experiment

We studied 15 consecutive patients with high-risk or relapsed CD19+ B-ALL treated with CD19 CAR-T cell therapy on the CARPALL study (NCT02443831) and in whom adequate CAR-T cells could be isolated for subsequent analyses from cryopreserved samples of blood or bone marrow. Outcomes of the first 14 patients infused were reported3; subsequently, a further 18 patients have been treated. Thirteen of 15 (87%) patients studied achieved complete remission; six of these responding patients subsequently relapsed, whereas the other seven achieved long-lived remissions maintained by detectable CAR-T cells and concomitant B cell aplasia (Fig. 1a). We performed detailed phenotyping by flow cytometry in 11 patients, and, in ten children, sufficient CAR-T cells were obtained for further interrogation by single-cell mRNA and T cell receptor (TCR) sequencing (73 patient samples split into 89 gene expression (GEX)/TCR and 64 flow samples; Supplementary Table 1 and Extended Data Fig. 1). Samples were taken from the CAR-T cell product as well as from patients at early (months 1–3), mid (months 4–6) and late (month 7 onwards) timepoints. Early timepoints were defined as 1–3 months, as all patients who achieved molecular complete remission with the absence of measurable residual disease did so within this window. The late timepoints were selected based on the timing of CD19 relapses, which were generally early events and would have occurred by month 7. The mid timepoints were the interval remaining between early and late. Two patients had samples from the product and at all timepoints (early, mid and late); four patients had all post-infusion timepoints represented; two patients had samples at two of three post-infusion timepoints; and two patients had only the early timepoint interrogated due to early relapse. We isolated CAR-T cells from peripheral blood or bone marrow by flow cytometry using CD3 and CAR expression, before single-cell sequencing (Chromium 10x platform) (Fig. 1b).

Fig. 1: Study overview and workflow.
figure 1

a, Swimmer plot illustrating the responses of individual pediatric patients with B-ALL to CAT CAR T-cell therapy and timepoints of sample collection. Attainment of complete remission was associated with attainment of B cell aplasia in all cases. Patient 5 had an isolated unilateral ocular relapse of CD19+ leukemia, which was treated with enucleation and remains in ongoing minimal residual disease (MRD) negative remission with no other intervention. Heat map to the right demonstrates timepoint representation per patient. Filled black boxes indicate the presence of the timepoint. Product = infusion products (M0); early = M1–M3; mid = M4–M6; late = M7–M60. b, Schematic workflow of study design. Samples were collected from infusion products (PROD), peripheral blood (PB) and bone marrow (BM) between M0 and M60. Samples were used either for flow-based immunophenotyping or for single-cell GEX and TCR sequencing on the Chromium 10x platform. RT, reverse transcription; Seq, sequencing. c, UMAP of all cells in the dataset highlighting cell types captured (left) and expression of the CAT-scFv CAR construct (right). CAT-scFv, low-affinity CAR (CAT) incorporating a CD19-specific scFv.

Double-negative CAR-T cells delineate late timepoints

In total, we recovered 264,827 single cells that passed quality control, approximately 50,000 of which were CAR-T cells (Supplementary Table 2). We grouped all 264,827 cells using commonly deployed analytical methods and visualized resultant clusters using uniform manifold approximation and projection (UMAP) (Fig. 1c and Extended Data Fig. 2). Clustering segregated CAR-T cells from non-CAR-T cells, with contributions from all patients. Two clusters were completely patient specific; these clusters represented ALL cancer cells from two children, patients P13 and P08, with CD19 relapses at the time of sampling. Subclustering of CAR-T cells segregated cells transcriptionally into infusion products at month 0 (M0), followed by early (M1–3), mid (M4–M6) and late (M7–M60) timepoints after infusion (Fig. 2a). Cycling cells congregated together from all timepoints, indicating that CAR-T cells remain proliferative several years after infusion. Using a marker-based annotation, CD8+ T cells were the predominant CAR-T cell at all timepoints in most cases, apart from late timepoints where CAR-T cells lacked expression of both CD4 and CD8A transcripts (Fig. 2b, Extended Data Fig. 3 and Supplementary Table 3). CD4 CAR-T cells made minor contributions at this point. Thus, late or persisting CAR-T populations were predominantly double-negative T cells.

Fig. 2: Characterization of CARPALL CAR-T cells.
figure 2

a, UMAP of CAT CAR-T cells demonstrates transcriptional clustering based on timepoint. Product = infusion products (M0); early = M1–M3; mid = M4–M6; late = M7–M60. b, UMAPs show scaled average expression of lymphocyte markers. c, Immunophenotyping CAT CAR-T cells by multi-parameter flow cytometry. Representative example of P01 showing cell identification using CD4 and CD8A at an early (M2) and a late (M72) timepoint. Cells were gated for CD3 and CAT CAR. df, Trajectory scatter plots quantify data from all patient samples for CD4, CD8 and double-negative CAR-T populations. g, Representative example of P01 showing immunophenotyping using CD45RA and CCR7 at an early (M2) and a late (M72) timepoint. Cells were gated for CD3 and CAT CAR. TCM, central memory; TEM, effector memory; TEMRA, terminally differentiated effector memory expressing CD45RA; TN, naive; TSCM, stem cell memory. Fluorochromes: BV, brilliant violet; Cy, cyanine dye; PE, R-phycoerythrin; PerCP, peridinin-chlorophyll protein.

Source data

Validation of double-negative CAR-T cell state

To confirm the early predominance of a CD8+ subset and the later emergence of a double-negative population, we implemented two orthogonal approaches. First, we analyzed peripheral blood (PB) and bone marrow (BM) samples from seven CARPALL patients collected at late timepoints by flow cytometry (7–72 months after infusion; Supplementary Table 1). We identified CAR-T cells using CD3 expression and use of an anti-idiotype antibody specific for the CAR and assessed expression of CD4 and CD8 on CAR-T cells (Fig. 2c–f and Extended Data Fig. 4a). This analysis confirmed that most cells were double negative at these late timepoints with a smaller contribution from CD8 T cells. This contrasted with the lower proportion of double-negative T cells in the non-CAR-T cell compartment in these patients (Supplementary Table 4). Furthermore, CAR-T cells were also characterised by lack of expression of CD45RA and CCR7, suggesting an effector memory phenotype (Fig. 2g and Extended Data Fig. 4b).

In our second approach, we pursued a cell-marker-independent analysis to assign cell identity to CAR-T cells. We directly compared CAR-T cell transcriptomes to a multi-modal, single-cell atlas of the circulating human immune system8. This reference is based on 211,000 human blood mononuclear cells interrogated by single-cell mRNA sequencing and by 228 anti-surface protein antibodies (CITE-seq). Consistent with our initial annotation and protein validation by flow cytometry, we observed that most late-persisting CAR-T cells were classified as double-negative cells, whereas CAR-T cells from earlier timepoints were mainly CD8+ T cells (Fig. 3a,b). The exceptions were patients P09 and P06 in whom an appreciable quantity of early CAR-T cells were double-negative γδ T cells with high expression of NKG7 and GNLY (Extended Data Fig. 5). This is consistent with a previous report that γδ T lymphocytes harbor similarities to CD8 T cells and natural killer (NK) cells9. Together, our initial observation with validation by two approaches demonstrates that most persisting CAR-T cells represented double-negative αβ T cells.

Fig. 3: Cell typing and defining a persisting CAR-T cell transcriptional signature.
figure 3

a, UMAP of CAT CAR-T cells shows closest cell type matching using a PBMC reference and cell-marker-based annotation. Late (persisting, M7–M60) CAR-T cells are in green hues; mid (M4–6) CAR-T cells are in black/blue hues; and early (M1–3) CAR-T cells are in pink/red hues. b, Stacked bar plots show dynamic cell type proportions over time for a representative patient (P01). The top five abundant cell types in the dataset are shown. c, Contour plots quantify expression of GZMK against GZMH across patients per timepoint. d, Heat map shows differential gene expression results creating gene signatures of each timepoint. DN, double-negative; TCM, central memory; TEM, effector memory.

Persisting CAR-T cells exhibit a transcriptional signature

Next, we identified differentially expressed genes among CAR-T cells from each timepoint to extract markers of infusion, early, mid and late CAR-T cells. Infusion products were enriched for genes related to cell cycle, nucleosome assembly and glycolysis, plausibly due to in vitro activation during manufacture. Infusion products expressed high levels of genes reflecting naive lymphocyte (that is, SELL, CCR7, IL7R and LRRN3) and early memory differentiation status, such as TCF7 and LEF1. The dominant gene expression pattern of post-infusion CARPALL CAR-T cells was defined on a continuum of granzyme gene expression (Fig. 3c). Across post-infusion timepoints, CAR-T cells were skewed toward either higher GZMH and GZMB expression or higher GZMK expression. CAR-T populations that were defined by higher expression of GZMK additionally expressed genes related to effector (LTB), memory (CD27 and IL7R) and activation (CD28) functions, whereas GZMH+GZMB+ cells expressed FGFBP2 and ZEB2. Unlike the other patients in this study, with one exception (P09), most CAR-T cells at late timepoints expressed GZMK. In non-CAR-T cells, the GZMH/B-GZMK pattern of expression was also observed; however, CAR-T cells expressed GZMK to much higher levels (Extended Data Fig. 6a). The most recurrent and strongest markers of late CAR-T cells generated a persisting CAR-T signature that was delineated by the expression of bona fide immune-related genes, such as TIGIT and GPR183, as well as genes with unknown or emerging roles in immune biology (Fig. 3d and Extended Data Fig. 6b,c). The latter genes include FXYD2, HMOX1, DENND2D and ISG20 (see Supplementary Table 5 for full gene signatures). The top marker of this population of cells was FXYD2, which encodes a modulator of the Na+/K+ ATPase channel. Of note, FXYD2 was one of the transcripts expressed in functionally cryptic CD34lowCD3 CD4+CD8 intrathymic T progenitors that have been described in the human thymus10. In aggregate, our data reveal that, within and across patients, thousands of CAR-T cells converge on a double-negative cellular phenotype that displays a common and distinct gene signature.

In recent years, the classical dogma of a dichotomy between memory and exhausted T cells has been challenged with the description of functionally active memory cells that bear an imprint of prior exhaustion11. One of the most highly expressed genes in the persistence signature was the exhaustion marker TIGIT. We, therefore, assessed the co-expression of exhaustion markers in our CAR-T cells. We found that late CAR-T cells expressed canonical co-inhibitory receptors, such as HAVCR2 and LAG3, but to a lesser extent PDCD1 (Fig. 4). We, therefore, interrogated the gene and matched protein expression (flow cytometry) of these mediators related to exhausted and precursor exhausted T cells (Extended Data Fig. 7a). Precursor exhausted T cells have been isolated in human cancer, where their presence has been associated with response to immunotherapy. Like precursor exhausted T cells, late CAR-T cells bore expression of relevant effector genes, such as GZMK and PRF1, as well as transcription factors associated with T cell exhaustion, including TOX12,13,14, NFATC1 (ref. 15), BATF16 and PRDM1 (ref. 17). Although exhibiting features of exhaustion, the late CAR-T cells did not appear terminally differentiated, as supported by low expression of B3GAT1 (CD57) and by expressing no more than two exhaustion markers by flow cytometry (Extended Data Fig. 7b). Unlike CAR-T cells within the products that we evaluated, late CAR-T cells did not express high levels of TCF7, which orchestrates a state of memory stemness in precursor exhausted T cells in other contexts18. However, instead, there was robust expression of JUN, an AP-1-associated transcription factor that can mediate the reversal of T cell exhaustion and maintenance of cells with stem cell memory properties19. Thus, although long-persisting CAR-T cells did not exactly phenocopy precursor exhausted T cell populations described previously, this cell type would best describe their effector memory, exhaustion-imprinted status determined both transcriptionally and by flow cytometry.

Fig. 4: CAR-T cell immunophenotyping.
figure 4

Bar plots show custom gene modules that functionally characterize CAR-T cells. The height of the bar refers to the fraction of cells per cell type that express the gene. Higher bars indicate that more cells of that cell type are expressing that gene. The shading refers to the average scaled expression of those genes for that cell type. Darker shading indicates that the expression of that gene is above the average expression of that gene across all cell types. All CAR-T cells (product, early, mid and late) from all patients (n = 10) are analyzed. DN, double-negative; TCM, central memory; TEM, effector memory.

Polyclonal population structures of persisting CAR-T cells

Within each patient, we had observed that, irrespective of T cell subset, thousands of cells converged on the same transcriptional state at late timepoints, raising the question of whether expansions of specific clones underpinned this functional convergence. We, therefore, interrogated TCR sequences of CAR-T cells and obtained readouts from 88 of 89 samples with concomitant gene expression data (Extended Data Fig. 1 and Supplementary Table 1). We found that the vast majority of cells across timepoints harbored unique clonotypes not observed at other timepoints. This indicates that the underlying gene pool remained sufficiently diverse to preclude the capture and tracking of individual clones (Fig. 5a and Extended Data Fig. 8a,b). An important consideration of this analysis is that the frequency of CAR-T cells diminishes over time, such that, by late timepoints, the frequency is as little as 0.05% of total CD3+ cells in circulating blood (Extended Data Fig. 8c). Nevertheless, if the population structure were monoclonal, we would capture the same clone on each blood draw. Of the few trackable clonotypes, the top 10 clonotypes at early timepoints remained among the relative majority at later timepoints but decreased in frequency over time. We observed an extensive variability in cell type composition among clonotypes, irrespective of whether they were unique or observed across timepoints. Clonotypes that were observed across timepoints were predominantly CD8+ T cells, whereas unique clonotypes tended to be double-negative T cells (Fig. 5b and Extended Data Fig. 8a,b). For two patients (P02 and P01), for whom we have infusion product TCR data, we were able to track 1.7% and 0.5% of clones across from infusion products to 2 years and 5 years, respectively. In aggregate, these clonal structures indicated that, at all timepoints, CAR-T cell populations were genetically diverse, consistent with insertion site analyses previously performed on CARPALL CAR-T cells20. In particular, there was no evidence of the dominance of one or more clones at late timepoints. Overall, these findings indicate that functional convergence of the persistence signature was not driven by clonal expansion.

Fig. 5: Population structures of CAR-T cells.
figure 5

a, Representative sample (P02) illustrating the changes in clonal architecture of CAR-T cells over time. Packed circle plots show the size of each clonotype. Filled-in black circles represent clonotypes that are not unique, as they are observed across timepoints. Conversely, pink donut circles represent clonotypes that are unique to that timepoint (and not observed across time). Blue/purple/gray colored circles represent the dominant clonotype at that timepoint that corresponds with the clonal trajectories below. b, Cell type composition stacked bar plots demonstrate the shift in cell type abundances between early and late timepoints and between unique and non-unique clonotypes. TEM, effector memory.

Evaluation of the persistence signature across T cells

As we had observed a transcriptional convergence of CAR-T cells across thousands of cells within and across patients, we speculated that the persistence signature may be pervasive across different CAR-T cell products. To date, one further single-cell transcriptomic study of persistent CAR-T cells has been reported—of two adult patients with CLL treated with anti-CD19 CAR-T cells (CTL019 cells) that have persisted for one decade thus far7. We interrogated CAR-T cell data from these two patients by assigning a persistence signature score to each cell (the AddModuleScore function in Seurat21). Remarkably, the module was expressed in CTL019-persisting CD4 CAR-T cells in almost its entirety (17/22 genes) (Fig. 6a). To compare our CARPALL CAR-T cell signal with CTL019 cells in an unbiased, quantitative manner, we used a method of cell-to-cell matching based on logistic regression22. We found that the strongest match of persisting CTL019 CD4+ CAR-T cells was to persisting double-negative T cells in the CARPALL data (Fig. 6b). It should be noted that persisting CTL019 cells were primarily derived from patient 1 (541/959, 56%), although, reassuringly, the persistence signature was also evident in a small number of cells from patient 2 (40/959, 4%). Overall, the similarity of persisting CARPALL and CTL019 CAR-T cells was not confined to gene sets but extended to the entire transcriptome.

Fig. 6: Evaluating the persisting transcriptional signature across T cells, including adult long-lived CAR-T cells.
figure 6

a, UMAP plots show expression of the strongest markers of the late-persisting CAR-T signature. Red arrows highlight persisting CAR-T cells between the CARPALL (CAT CAR-T) and CTL019 studies. b, Heat map demonstrates logistic regression cell-to-cell matching. CTL019-persisting CD4 CAR-T cells match strongly to late double-negative CAT CAR-T cells from the CARPALL study, as indicated by the red arrows. c, Box plots show the per-cell distribution of the late score as defined by the late-persisting CAR-T signature. Scores are shown for CAR-T, cancer and normal datasets. Publicly available datasets analyzed in this study are described in Supplementary Table 6. The red line represents the median of the late double-negative (DN TEM) cells from the CARPALL study. Colored cell types refer to CARPALL CAR-T cells. Red arrows indicate persisting CAT CAR-T cells from CARPALL and persisting CTL019 cells. Box plots show the first quartile (the lower end of the box) and the third quartile (the upper other end of the box) as well as the median values (center line) per dataset. The ‘whiskers’ extend from the ends of the box to a maximum and minimum of 1.5 times the interquartile range beyond the box. Outliers are shown as dots. AXICEL, axicabtagene ciloleucel CAR-T cells (infusion products); BRCA, breast cancer; ChiCTR, BCMA CAR-T (ChiCTR, 1800017404); CRC, colorectal cancer; DN, double-negative; GBM, glioblastoma; HCC, hepatocellular carcinoma; JCAR014, Fred Hutchinson Cancer Center CAR-T cells; NB, neuroblastoma; NPC, nasopharyngeal carcinoma; NSCLC, non-small lung cell carcinoma; SKCM, skin cutaneous melanoma (1 = Li; 2 = Yost; 3 = Jerby-Anon); TCM, central memory; TEM, effector memory. The numbers of cells and samples used in this figure are described in Supplementary Table 6.

We then questioned whether the persistence signature of CAR-T cells may have a physiological correlate. To this end, we scored T cells from a variety of healthy tissues, other CAR-T studies to date5,6,23 and cancer cell single-cell atlases, including normal peripheral blood8, human fetal bone marrow24, human fetal thymus25 and as many as 16 types of human cancers26,27,28,29,30,31,32,33,34, including tumors that are considered to be immunogenic and had long-term response to immune checkpoint inhibitors (for example, lung cancer and melanoma) (see Supplementary Table 6 for datasets analyzed). We were unable to detect T cells harboring the persistence signature at an appreciable frequency in any one tissue, barring occasional cells (Fig. 6c and Extended Data Fig. 9). The median frequency across tissues was −0.04 (−0.6 to 1.4). These observations indicate that the CAR-T cell persistence signature is rarely found in other biological contexts.


A lack of CAR-T cell persistence leading to CD19+ relapse is the main cause of therapy failure after licensed CAR-T cell therapy for ALL35,36 and contributes to relapse in other B cell malignancies, such as myeloma37. Therefore, a key question of CAR-T cell biology is why some cells persist whereas others perish. With this knowledge, we might better understand how to select patients, modify treatment phasing and optimize manufacturing protocols to support greater persistence and improve outcomes. To date, robust biomarkers of persistence have not been identified and can be validated only after directly demonstrating successful long-term persistence in patients. A key requirement of this is to systematically examine the biological status of long-lived CAR-T cells. Currently, there is a paucity of these datasets, as persisting CAR-T cells have been probed in only a very limited number of patients to date7. As such, we generated a single-cell RNA sequencing (scRNA-seq) dataset of cells from 10 patients with B-ALL treated with a CD19-targeting CAR-T cell product. Although our work represents, to our knowledge, the largest study of persisting single CAR-T cell transcriptomes, it still represents a modest cohort. Our key finding of a recurrent transcriptional state representing persistence is unlikely to be affected by the size of the cohort. Within each individual, every long-lived CAR-T cell represents a biological replicate of the signal. Accordingly, our finding has been reproduced multiple times within patients and has then been validated across individuals, including in the different clinical context of CLL. However, the size of our study precludes our ability to capture nuances of CAR-T cell transcription as well as any associations between CAR-T cell states and clinical subsets of patients, for which larger studies are required.

We found that late-persisting CAR-T cells mainly comprised a population that did not express CD8-α or CD4 co-receptors transcriptionally or via surface expression. In healthy individuals, double-negative cells typically comprise a minor population of all T cells, and we observed similar proportions in non-CAR T-cells from the same patient. In general, there was a steady reduction in CD8+ CAR-T cells over time, which matched a progressive increase in double-negative populations. This contrasts the long-lived CAR-T cells from adult CAR-treated patients with CLL, where double-negative CAR-T cell populations were noted at earlier timepoints and, on further investigation, were determined to be γδ T cells. Although γδ T cells were also observed in four patients in our cohort, they did not contribute to early CAR-T cell populations in the other patients. In our cohort, the predominance of double-negative CAR-T cells was particularly noted at later timepoints in all patients. We verified that, at the later timepoints, double-negative T cells were not contributed to by γδ CAR T cells or CAR NK cells. Late CAR-T cells in both cohorts showed evidence of an activated, proliferative and effector status with strong expression of GZMA and GZMK. Because the double-negative phenotype observed in late-persisting CAR-T cells is reminiscent of early thymocyte differentiation and the fact that we noted high expression levels of GPR183, an oxysterol receptor that provides survival and migratory signals to thymocytes and CD4+ T follicular helper cells38, we hypothesized a link between persisting CAR-T cells and thymic cell development. Ultimately, we found that the transcriptional status of the late, double-negative CAR-T cells did not map to any thymocyte subset in T cell development but, rather, to mature T cells.

Late-persisting CAR-T cells did not conform to quiescent early memory T cell populations but expressed genes associated with effector function and an activated state. These cells also maintained their proliferative capacity. In keeping with recent reports of precursor exhausted T cells bearing the hallmark of activation, late CAR-T cell populations expressed markers and transcription factors associated with exhaustion, including TOX12,13,14 and BATF39, among others. With reference to this highly activated status, one might speculate that these are circulating effector cells differentiated from rarer memory precursor populations after exposure to antigen. As these patients had no evidence of circulating B cells or existing CD19-expressing hematogones in the bone marrow, it is plausible that these cells were activated as the result of an emerging CD19-expressing hematogone population. However, the existence of minute central or stem cell memory CAR-T cells at this time-point may count against this hypothesis.

Late-persisting CAR-T cells, although activated and imprinted with markers of exhaustion, did not express FOXO3 or B3GAT1, which are associated with terminal differentiation in the context of exhaustion. Instead, they expressed markers associated with memory-like characteristics, including Jun, BCL2 and IL7R. Thus, they more closely matched precursor exhausted T cell populations as have been described in chronic viral infections11,39,40, cancer-infiltrating T cells41,42 and early post-infusion of CD19 CAR-T cells. Unlike previous reports of precursor exhausted T cell populations, however, they did not express high levels of TCF7 (refs. 39,40) or FOXO1 (refs. 43,44), confounding the suggested centrality of such transcription factors in driving long-lived CAR-T cell persistence and overcoming terminal exhaustion. In a previous report, c-Jun overexpression was sufficient to restore antigen responsiveness, memory function and long-term proliferative capacity in CAR-T cells exhausted due to tonic CAR signaling. The high expression of JUN in most late-persisting CAR-T cells in this study points to a plausible mechanism for long-lived persistence in these cells. Overall, our data instead support that, although previously exhausted CAR-T cells may indeed give rise to long-persisting populations, the predominance of transcription factors driving memory status is likely specific to the characteristics of the CAR, disease and model. Despite some similarities to precursor exhausted T cells, the persistence module was rarely expressed by T cells in a range of pathophysiological contexts. Within the signature, there were a number of genes with little-known roles in T cells, including FXYD2, DENND2D and HMOX1. Overall, further work is needed to elucidate their function in T cells and how they may contribute to persistence.

A key finding of this study was a transcriptional signature of persistence that was reproducible across thousands of cells in every patient with long-lived CAR-T cells and durable anti-B-ALL responses. This signature was not identified when interrogating non-CAR-T cells from the same patients, T cell populations in normal development, T cells from a range of cancer datasets or other CAR-T cell studies. The persistence signature and underlying cell state were detected in an independent dataset of long-lived CAR-T cells from adults with CLL who had received a different CD19 CAR-T cell product. Of note, persisting CAR-T cells from the independent dataset were cytotoxic CD4 T cells with oligoclonal population structures, in contrast to double-negative CAR-T cells that were polyclonal, as reported in this study. These differences could arise from the different techniques used for TCR analyses (integration site analysis versus single-cell TCR analysis), the number of patients evaluated or the fundamental differences in the CAR product. Although we noted these differences among CAR-T cells at different timepoints, the late CAR-T cell signature that we defined here was reproducible across both studies, indicating that it may represent a surrogate marker of longevity. Although this falls short of an easily measured biomarker of persistence with which to test CAR-T cell products, this understanding brings us a step closer to identifying such assays. That the transcriptional status noted was so pervasive in long-persisting CAR-T cells lends weight to the possibility that the signature may not only be a surrogate marker of longevity but, notably, may also provide a basis to investigate underlying cell-intrinsic or cell-extrinsic factors that drive CAR-T cell persistence. Given the data presented here, the longevity of CAR-T cells is likely not based on clonal selection and expansion. Rather, it is possible that the ongoing interplay with the environment shapes the resultant phenotype of long-lived CAR-T cells and supports functional diversity. With this knowledge, we will be primed in our ability to engineer this key characteristic into CAR-T cell therapies for hematological malignancies of the future.


Sample acquisition, ethics and patient consent

Data from this study were generated from patients enrolled in the CARPALL study (NCT02443831). CARPALL was a multi-center, non-randomized, open-label, phase 1, single-stage clinical study designed to evaluate the safety, efficacy and response of CD19 CAR-T cells in children and young adults (≤24 years of age) with high-risk relapsed CD19+ malignancies. Patient data were collected at Great Ormond Street Hospital (GOSH) and the University College London (UCL)-GOSH Institute of Child Health (ICH), and laboratory data were generated in the study central laboratories at GOSH, UCL-GOSH ICH as well as the Sanger Institute. Patient recruitment occurred from 2016 to 2019. Data collection, sequencing and analysis were from 2016 to 2023. The study protocol and outcomes are available here: Key clinical factors for this cohort are described in Supplementary Table 7. All patients who took part in this study were diagnosed with B-ALL. Written informed consent was obtained from all patients or their parents/guardians before study entry. Patients did not receive compensation for participation in the study. Patient sex was reported by patients or parents and confirmed upon (external) examination. Study results do not apply to any one sex or gender. Sex or gender were not considered in the study design, as all children and young aduts with high-risk B-ALL, independent of sex/gender, were considered. The sex of patients was noted, and this is described in the table of patient characteristics (Supplementary Table 7). This trial was approved by the UK Medicines and Healthcare Products Regulatory Agency (clinical trial authorization no. 20363/0361/001). Ethical approval was obtained from the London–West London Gene Therapy Advisory Committee (GTAC) Research Ethics Committee (REC ref. no. 16/LO/0283). Note that the CARPALL study initially used monospecific low-affinity CD19 CAR-T cells for therapy of B-ALL; however, a study amendment allowing investigation of dual CD19 and CD22 CAR targeting is currently displayed on the website. Historic versions of this trial before November 2020 can be viewed using the following link: (compare any version before November 2020). The analyses included here were not pre-specified in the clinical trial protocol.

Flow cytometry

CAR-T cells were isolated from either fresh peripheral blood or cryopreserved aliquots of the infusion product (IP), peripheral blood mononuclear cells (PBMCs) or bone marrow mononuclear cells (BMMCs). For fresh peripheral blood, PBMCs were isolated via density gradient centrifugation with Lymphopure (BioLegend). For cryopreserved samples, aliquots were rapidly thawed and washed in complete RPMI (10% FCS and 1% L-glutamine, Gibco). Flow cytometry was performed with a BD LSR II and cell sorting with a FACSAria III (BD Biosciences). Data analysis was performed using FlowJo version 10 (Tree Star) or FACS DIVA 8.0.1. Expression of CAR was detected by a CAR anti-idiotype antibody (bespoke product, Evitria, 1/200) and goat anti-rat IgG PE antibody (Poly4054, BioLegend, 1/400). The following reagents were used for phenotypic analysis of CAR-T cells: PD-1 BV421 (EH12.2H7, BioLegend, 1/20), CD45RA BV510 (HI100, BD Biosciences, 1/100), Lag3 BV605 (11C3C65, BioLegend, 1/20), TCRgd BV650 (B1, BD Biosciences, 1/20), CD127 BV711 (HIL-7R-M21, BD Biosciences, 1/20), CD4 BV784 (SK3, BioLegend, 1/100), CD25 VioBright FITC (4E3, Miltenyi Biotec, 1/100), Tim3 PECF594 (7D3, BD Biosciences, 1/20), CD8 PerCP-Cy5.5 (SK1, BioLegend, 1/40), CCR7 PE/Cy7 (G043H7, BioLegend, 1/40), CD95 APC (581, BioLegend, 1/10), CD3 AF700 (SK7, BioLegend, 1/40), CD27 APC/Cy7 (M-T271, BioLegend, 1/20), TIGIT BV605 (741182, BD Biosciences, 1/40), GPR183 PE/Dazzle594 (SA313E4, BioLegend, 1/40) and GZMK APC (GM26E7, BioLegend, 1/40). DAPI and Fixable Viability Dye eFluor 455UV (eBioscience) were used to discriminate viable cells. For intracellular markers, cells were fixed (Fixation Buffer, BioLegend) and permeabilized (Intracellular Staining Permeabilization Wash Buffer 10×, BioLegend) before staining. Human BD Fc Block (BD Biosciences) was used as a blocking reagent. Fluorescence minus one (FMO) controls were used to determine expression thresholds where required. The full list of antibodies can be found in Supplementary Table 8. The flow cytometry gating strategy for immunophenotyping can be found in Extended Data Fig. 10.

CAR-T cell isolation and scRNA-seq using the 10x Chromium platform

Patient cells were harvested as described above for flow cytometry. Cryopreserved samples for 10x were rapidly thawed and washed with complete RPMI containing 50 U ml−1 of benzonase (Merck Life Science Limited). Cells were then stained with CAR anti-idiotype, followed by goat anti-rat IgG PE antibody and antibodies to CD3 APC (UCHT1, BioLegend, 1/20) and CD45 FITC (2D1, BioLegend, 1/20). DAPI was used to distinguish viable cells. CAR-T cells were isolated as CD45+CD3+CAR+ events in a live singlet leukocyte forward-scatter (FSC)/side-scatter (SSC) gate using a BD FACSAria III flow sorter. The flow cytometry gating strategy for CAR sorting can be found in Extended Data Fig. 10. CAR and non-CAR populations were sorted simultaneously and then immediately used downstream for the 10x workflow. Flow-sorted cells (CAR and non-CAR) were loaded according to the standard protocol of the Chromium Single Cell 5′ Kit (v2 chemistry). A TCR single-cell library was subsequently prepared from the same cells with the Chromium Single Cell V(D)J Enrichment Kit. The 5′ gene expression library and the TCR single-cell library were pooled with a molar ratio 10:1 for sequencing on Illumina NovaSeq S4 with 28 × 90 bp, aiming for an average of 300,000 reads per cell for the 5′ gene expression library and 30,000 reads per cell for the TCR single-cell library.

Raw sequencing data processing, data filtering and normalization

The raw scRNA-seq data were demultiplexed and mapped to reference genome GRCh38, with the CAT-scFv sequence inserted, using Cell Ranger (10x Genomics, version 5.0.0). To filter lower-quality cells, we removed any cell with fewer than 300 genes, fewer than 1,000 unique molecular identifiers (UMIs) or where more than 10% of the read counts were derived from the mitochondrial genome. We excluded nuclear mitochondrial genes, heat shock proteins and ribosomal genes from our analysis.

Feature counts for each cell were divided by the total counts for that cell and multiplied by 10,000, followed by natural-log transformation using log1p. Counts data were then scaled such that each feature will be centered to have a mean of 0 and an s.d. of 1 for each gene. Principal component analysis was performed using the top 2,000 highly variable genes, and data were grouped into clusters using a community detection finding algorithm taking the first 75 principal components as inputs. Using these principal components, we calculated a UMAP for data visualization and calculated clusters using the k-nearest neighbors approach with resolution parameter set to 1. This was performed using the Seurat package in R (R version 4.0.3 and Seurat version 4.0.6).

Cluster annotation and multi-modal reference mapping

CAR-T cells were defined as cells sorted for CD3 and the CAR by flow cytometry and belonging to clusters expressing the ‘CAT-scFv’ gene. CAR-T cells were clustered separately and labeled with their timepoint bins: product (M0), early (M1−M3), mid (M4−M6) and late (M7−M60). Clusters were subsequently annotated using lymphoid markers (that is, CD8A, CD8B and CD4) and established markers of T cell states curated from literature (Extended Data Fig. 3 and Supplementary Table 2). To supplement cell type annotation, the PBMC multi-modal reference was downloaded and processed using the instructions from the vignette. CAR-T cells were projected into the multi-modal reference using the FindTransferAnchors() and MapQuery() functions available in Seurat.

Differential gene expression and immunophenotyping of CAR-T cells

CAR-T cells were clustered separately at a global (across patients) level and per patient. Seurat’s FindAllMarkers() function was used to identify differentially expressed genes from cells across patients (global clustering) and within a patient using previously annotated cell types and timepoint bins (product, early, mid or late) as the label (that is, late: CD8 TEM). These analyses were performed using the two-sided Wilcoxon rank-sum test with Bonferroni multiple testing correction. Only genes with an average log2 fold change above 0.5 were considered. For the per-patient analysis, markers were tallied and ordered from most to least recurrent across labels (timepoint bin: cell type). With the exception of the product, where only two samples were available, markers were considered recurrent if present in more than two patients. Gene signatures were derived from the intersection of the top 20 recurrent (across patients) marker genes and the global markers. For immunophenotyping analysis presented in Fig. 4, gene modules were curated from literature. The average scaled expression and percentage of cells expressing the gene were determined using the input derived from the data slot of the DotPlot() function in Seurat and replotted as shaded bar plots.

TCR analysis

Chromium 10x V(D)J single-cell sequencing data were mapped and quantified using the software package cellranger vdj (version 5.0.0) using the GRCh38 reference (vdj_GRCh38_alts_ensembl-5.0.0). The consensus annotation files were generated per sample and used for downstream analyses. Clonotypes were defined per experimental sample based on unique TCR VJ sequences and complementarity-determining region (CDR3) motifs. Basic TCR statistics, such as the number of clones and the distribution of lengths and counts, were computed using Immunarch (version 0.7.0). For clonal tracking analyses, entries with a single or more than two alpha or beta chain(s) were considered one clone. Clonal population circles were created using the ggraph and igraph packages in R (version 2.0.5 and version 1.2.6, respectively). Unique clonotypes were defined as cells with shared TCR alpha and beta sequences that were not observed across timepoints but were uniquely observed at only one timepoint within the patient. Conversely, non-unique clonotypes are cells with shared TCR alpha and beta sequences that are present across at least two timepoints within a patient. The population circle plots were created by defining a ‘root’ and specifying the clonotype names and sizes as ‘branches’ on the same level of the tree.

Cell-to-cell matching: logistic regression

To determine the probability that the transcriptome of each CARPALL CAT CAR-T cell was similar to CTL019 (tisagenlecleucel) CAR-T cells from two adult patients with CLL7, logistic regression was used in R, as previously described22,26,45,46. CTL019 raw counts data were processed as described above, using the same parameters as the CARPALL dataset. CTL019 cells were re-annotated using marker-based approaches, as described above. We trained logistic regression models with CTL019 cells using our cell type annotation.

Gene module scoring

Published datasets from CAR-T cells, cancer and normal development were downloaded, and T cells were identified using CD3D and CD3E expression. T cell partitioned datasets were randomly downsampled to 10,000 cells, if exceeding this threshold. T cell clusters were processed and re-clustered, as described above. Module scores were calculated using the AddModuleScore() function available in Seurat using Seurat clusters as labels (Louvain algorithm). The average expression level of each cell type (or cluster) was calculated on a single-cell level and then subtracted by the aggregated expression of control feature sets. Gene modules were defined based on differential gene expression of CAT CAR-T cells.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.