Deconvolution of clinical variance in CAR-T cell pharmacology and response

Chimeric antigen receptor T cell (CAR-T) expansion and persistence vary widely among patients and predict both efficacy and toxicity. However, the mechanisms underlying clinical outcomes and patient variability are poorly defined. In this study, we developed a mathematical description of T cell responses wherein transitions among memory, effector and exhausted T cell states are coordinately regulated by tumor antigen engagement. The model is trained using clinical data from CAR-T products in different hematological malignancies and identifies cell-intrinsic differences in the turnover rate of memory cells and cytotoxic potency of effectors as the primary determinants of clinical response. Using a machine learning workflow, we demonstrate that product-intrinsic differences can accurately predict patient outcomes based on pre-infusion transcriptomes, and additional pharmacological variance arises from cellular interactions with patient tumors. We found that transcriptional signatures outperform T cell immunophenotyping as predictive of clinical response for two CD19-targeted CAR-T products in three indications, enabling a new phase of predictive CAR-T product development.

Our model formulation makes a few fundamental changes to the typical predator-prey structure to address the four requisite properties and incorporate fundamental T cell biology.Borrowing from the stem cell field, we encode each T memory (  ) cell division as a fate choice between self-renewal and differentiation, driven by tumor antigen (  ).CAR-T differentiation and expansion thus occur at the expanse of depleting the pool of memory cells.Effector cells (  ) cannot self-renew, but rather undergo a fixed number () of divisions.This is a novel feature absent from other mechanism-based CAR-T models and addresses the unlimited CAR-T expansion capacity embedded in predator-prey models.Accounting of memory cell self-renewal vs. differentiation also provides a mechanism by which chronic antigen stimulation (or alternatively, insufficient CAR-T dose relative to tumor size) drives exhaustion.If tumor cells cannot be cleared sufficiently to reduce systemic antigen burden below a defined threshold (50), TM cells will continually differentiate until the pool of long-term memory cells is depleted.
We have also included an exhausted T cell state, notably absent from all above CAR-T models.We believe this is necessary to capture the divergence between CAR-T pharmacokinetics and cytotoxic function, particularly in partial and non-responding patients (explored in detail below).

Model structural assessment
To systematically assess the model topology, we created a series of variants with alternate T cell population structures: 1.   population only (effector state) 2.   and   (memory and effector states), no   3.   and   (effector and exhausted states) 4.   ,   , and   states, but without effector to memory differentiation (  = 0) 5. Inclusion of additional naïve (  ) state to the original model.
We illustrate these model structures as cartoons in Figure S1.
Note the original (complete model) describes 4 sub-populations regulated by antigen exposure via the ODEs: For model variant 1, we describe the single effector compartment, wherein proliferation/self-renewal is driven by antigen: For model variant 2, we employ the full model, but set   = 0 such that no exhausted T cells are generated.
For model variant 3, we employ a version of variant 1, wherein effectors both proliferate/self-renew and transit to exhausted cells in an antigen-dependent manner.For model variant 4, we employ the original set of model equations, but set   = 0 such that memory cells cannot arise from effectors.Note the origin of long-term memory cells remains a point of contention among immunologists.It is established that following clearance of infection, antigen-specific effectors are replaced by antigen-specific memory cells.These were previously assumed to arise via dedifferentiation from a subset of differentiation effectors 12 (as coded in our model), though new evidence suggests that these arise from a rare population of stem-like cells 13 .There is data supporting both lineage models 14,15 , and differing sub-sets of memory cells may follow both paths 16,17 .This variant thus assesses whether this teleological de-differentiation reaction is necessary within our model framework, given the limited number of T cell states considered for parsimony.
For model variant 5, we have included a naïve T cell compartment (  ) preceding the memory compartment, as per canonical T cell differentiation hierarchy 12 .These cells proliferate and differentiate to memory   cells in an antigen-dependent manner, via the equation: TN cells differentiate into the memory cell compartment, such that the TM balance equation is now: This introduces five additional free parameters into the model: : the naïve T cell proliferation rate   : the naïve T cell probability of self-renewal   : the Hill exponent linking antigen exposure to naïve T cell proliferation   : the naïve T cell death rate _  : the fraction of CAR-T cell dose in the naïve T cell compartment The resulting model fits to the CR/PR/NR populations from Fraietta et al. 18 for the five structural variants as compared to the full model are shown in Figure S2.Note that all model variants are essentially equivalent with respect to their ability to describe the tumor dynamics but differ substantially in the ability to capture the CAR-T pharmacokinetics.
Model selection for non-linear dynamical models is inherently challenging given the lack of appropriate quantitative metrics.The Akaike Information Criterion (AIC) is widely used, ranking models based on fitting error (MSE) vs. complexity (number of free parameters): 19 : Wherein n = number of measurements, k = free parameters and MSE = mean squared error.
However, this was originally developed to rank multivariate linear regression models rather than nonlinear ODEs and prioritizes limiting free parameters over goodness-of-fit.More fundamentally, MSE essentially considers all data points to be of equivalent value, while subjectively we know some datapoints and readouts are more-or-less significant.Based on MSE of all the data, variant 5 (inclusion of the   cell compartment) is the most accurate, outperforming the original model (variant 0).Examination of the PK curves in Figure S1 reveals this improvement is due to capturing the last time point (12 month) of the NR profile, which increases from the previous (6 month).We believe this may be an artefact of the data (population average) rather than a real phenomenon, implying the model is overfitting.Note that model variant 5 contains five additional parameters as compared to the original model, and the resulting AIC more than doubles from 57 to 140, indicating this additional complexity adds little value.
Based on the AICc, variant 2 (lacking an exhausted state but containing   and   ,and the reversable transitions) is ranked highest.However, this version does not adequately capture the pharmacokinetics of the NR population, which reads out as higher MSE of the CAR-T exposure metrics (Cmax and AUC).The deficiency of using AICc as a model selection metric is apparent by examining the curves for model variant 1 (lacking   and   compartments).While this is ranked higher than the original model due to the reduced number of free parameters, the PK curves do not even closely resemble the data.
Using the exposure metrics (both Cmax and AUC), the original model (variant 0) best captures the data.Consideration of the fitting error, model complexity, and assessment of exposures, we feel the original model outperforms all structural variants.
Note that selection of non-linear models is somewhat subjective.We view mathematical models as caricatures of biological systems; quantitative tools that should capture dynamic organizing principles and explain the data in broad strokes.A model's value lies in what we can learn from it, rather than how complex and detailed it is.That is, the different variants assessed (and perhaps others not considered) could be equally valuable.These fitting metrics help decide which models to learn from, but do not necessarily imply one as true and others false.For the PR and NR groups, the model describes the dynamics of initial tumor size reduction (or lack thereof), but is not the subsequent dynamics.Both the PR and NR tumor dynamics appear to fluctuate up to day-100, then decline, though these may represent sampling artifacts rather than real dynamic features.

Supplemental Figures
We further quantified accuracy by compared the mean squared errors resulting from the set of estimated model parameters to that obtained by random sampling of parameter search space (n=100).P-values for the three groups were all < 10 -7 (rank-sum test), indicating the optimized parameters represent a small segment of parameter space.Sensitivity Coefficients (LPSC) were calculated for CR, PR, and NR populations using the AUC of T cells (aucT) and AUC of B cells (aucB) as outputs, with samples and outputs organized by agglomerative hierarchical clustering.Memory cell proliferation (  ) was identified as a critical driver of exposure and response in the CR and PR populations.TK50 was not a locally sensitive parameter despite being identified in the PCA as an important parameter (Figure 1D).The rate of tumor cell lysis (  ) was the most sensitive parameter mediating tumor response in the CR population.The rate of memory cell regeneration (  ) and the number of effector cell doublings () were additionally found to mediate both CAR-T exposure and tumor response in CR and PR populations, despite not varying significantly between the groups.   , n=19.Cell types annotated at frequencies of less than 5% are excluded; CD4+ Naïve, CD8+ Naïve, CD8+ Tprecursorexhausted (Tpex) and CD4+ follicular-helper (Tfh).(D) Immunophenotype-defined T early memory (CD8+CD45RO-CD27+) and exhausted (CD8+PD1+) cell frequencies by response category for Kymriah in CLL, digitized from Fraietta et al. 18 , n=38.(B) Immunophenotype-defined Early memory and exhausted cell frequencies by response category in for Kymriah in ALL, calculated from Bai et al. 22 CITEseq antibody tags, n=12.Boxplots represent median ±25 percentiles, and whiskers the min/max value or an additional 1.5-fold quartile distance.

Figure S1 :
Figure S1: Structures of original model and variants.

Figure S2 :
Figure S2: Fitting accuracy of the original proposed model against five structural variants.(A) Mean squared error (MSE) of the original (full) model and the five model variants fit the training data 18 , as well as random sampling of parameter search space for the original model (n=100, error bars = std).MSE plots are separated by fit to the pharmacokinetic and tumor dynamics, and rank ordered by overall goodness of fit.(B) Model simulations overlaid with training data for the original (full) model and five variants.

Figure S3 .
Figure S3.Model fitting to pre-clinical CD19-CAR-T data.NALM-6 xenograft bearing mice (injected with 106 tumor cells at day -6) were treated with increasing doses of Kymriah, and tumor size measured by fluorescence imaging 20 .We assume for simplicity a 1:1 scaling relationship between photons/s and tumor cell number.For fitting mouse as compared to clinical data, we scaled down bounds on the tumor-related parameters Bmax (maximum tumor size) and TK50 (T cell EC50 driving tumor cell killing) and allowed the tumor growth rate (uB) to float between 0.1 and 1 per day.(A) Pharmacokinetic and (B) tumor dynamic data and model simulations for CAR-T doses of 0, 1, 3 and 5 million cells.Goodness of fit plots for the CAR-T pharmacokinetic (C) and tumor dynamics (D), with Pearson correlation coefficients of 0.88 and 0.84 respectively.

Figure S4 .
Figure S4.Model fitting to pre-clinical BCMA-CAR-T data.MM1.s xenograft bearing mice (injected with 5x106 tumor cells at day -14 to -8) were treated with increasing doses of the research-grade CAR-T 'BCMA-R2', and tumor size measured by fluorescence imaging 21 .We assume for simplicity a 1:1 scaling relationship between photons/s and tumor cell number.For fitting mouse as compared to clinical data, we scaled down bounds on the tumor-related parameters Bmax (maximum tumor size) and TK50 (T cell EC50 driving tumor cell killing) and allowed the tumor growth rate (uB) to float between 0.1 and 1 per day.(A) Pharmacokinetic and (B) tumor dynamic data and model simulations for CAR-T doses of 0, 1, 3 and 10 million cells.Goodness of fit plots for the CAR-T pharmacokinetic (C) and tumor dynamics (D), with Pearson correlation coefficients of 0.92 and 0.99 respectively.

Figure S5 :
Figure S5: Goodness-of-fit plots for Kymriah model fitting in Figure 1.Accuracy was quantified by computing Pearson correlations of the goodness-of-fits (log-simulations vs. log-data), separated by CAR-T and tumor kinetics for the CR, PR, and NR populations.For the CAR-T pharmacokinetics, median correlation coefficients for the CR, PR, and NR populations are 0.89, 0.88, and 0.76, respectively.The model therefore captures the majority of variance in the PK data for all three groups.For tumor dynamics, the correlations are 0.78, 0.77, and -0.23, respectively.Note the tumor growth portion of the model is very minimal (logistic equation) and captures the overall trends and differences between populations while missing aspects of the dynamics.The CR tumor kinetics are captured relatively well.For the PR and NR groups, the model describes the dynamics of initial tumor size reduction (or lack thereof), but is not the subsequent dynamics.Both the PR and NR tumor dynamics appear to fluctuate up to day-100, then decline, though these may represent sampling artifacts rather than real dynamic features.

Figure S6 :
Figure S6: Local Parameter Sensitivity Analysis of CR/PR/NR populations.Local ParameterSensitivity Coefficients (LPSC) were calculated for CR, PR, and NR populations using the AUC of T cells (aucT) and AUC of B cells (aucB) as outputs, with samples and outputs organized by agglomerative hierarchical clustering.Memory cell proliferation (  ) was identified as a critical driver of exposure and response in the CR and PR populations.TK50 was not a locally sensitive parameter despite being identified in the PCA as an important parameter (Figure1D).The rate of tumor cell lysis (  ) was the most sensitive parameter mediating tumor response in the CR population.The rate of memory cell regeneration (  ) and the number of effector cell doublings () were additionally found to mediate both CAR-T exposure and tumor response in CR and PR populations, despite not varying significantly between the groups.

Figure S7 .
Figure S7.Model fitting results based on the hypothesis that the only distinguishing feature between CR, PR, and NR populations is the fraction of memory T and exhausted T cells in the CAR-T infusion product.Memory cell fraction (f_Tm) and exhausted cell fraction (f_Tx) were estimated as between 1-50% independently for the CR, PR, and NR populations while all other model parameters were estimated simultaneously using a single vector for the CR, PR, and NR populations.Simulations of best fit model (estimated by MSE minimization) from 12 optimization runs for (A) CAR-T pharmacokinetics and (B) tumor dynamics.(C) Estimated fraction of memory and exhausted cells (f_Tm, f_Tx) in CAR-T infusion products for CR, PR, and NR populations.Bars represent medians ± 25 percentile intervals from n=12 model fits.

Fig S8 .
Fig S8.Simulated pharmacokinetic and tumor dynamic responses to increasing cell doses of pure memory cell populations from CR, PR, and NR population models.Simulations were run at doses of 1, 3, 10, 30, 100, 300 and 1000 million cells using sets estimated for CR (A), PR (B), and NR (C) populations.For direct comparison, the memory cell fraction was set to 100% for each.

Figure S9 .
Figure S9.Volcano plot of differentially expressed genes between CR vs. NR groups.Falsediscovery Rate (FDR) based on moderated t-test in Limma.

Figure S10 .
Figure S10.Select gene sets differentially enriched between CR vs. NR groups.Gene sets were derived from BioCarta, Reactome, DAVID, Fraietta et al., Thymic Cell Atlas and PROGENy, and represented as signed log10(P-val) P-values calculated using Kolmogorov-Smirnov test in GSEA.

Figure S11 .
Figure S11.Simulated frequencies of memory (  ), effector (  ), and exhausted (  ) CAR-T cells for CR, PR, and NR patient groups shown in Figure 1.

Figure S12 .
Figure S12.Pairwise Pearson correlation coefficients between Thymic Atlas cell population gene signatures computed using ssGSEA scores from Fraietta et al.RNAseq data.Note signatures for CD8 Tmem, CD4 Tmem, and CD4 T cells are tightly correlated.

Figure S14 :
Figure S14: Transcirptome classifier performance as compared to null and random pathway models.Distribution of predictive accuracies are shown for 2500 iterations using 60:40 train:test split cross validation.Results from the 28-signature transcriptome-based ssGSEA classifier ("Transcriptome") are compared null models (random classification; "null") and an ssGSEA classifier trained on a randomized selection of pathways from the compendium ("Random").

Figure S15 :
Figure S15: PKPD response depends on initial tumor burden and CAR-T dose.Model simulations were performed across a grid of CAR-T dose, initial tumor burden, and parameter set in the CR population to determine the (A) average tumor AUC and (B) average CAR-T Cmax.Tumor AUC increases with initial tumor burden and decreases with initial CAR-T dose for CR parameters.Cmax exhibits a more complex relationship, peaking for intermediate tumor burdens and generally increasing with initial CAR-T dose.This non-linear interaction between tumor burden and CAR-T dose likely contributes to the clinically observed variability.

Figure S16 :
Figure S16: Comparative pharmacokinetics of Kymriah in B-ALL, our model of Kymriah in CLL, and Yescarta in LCBCL.Cmax and C(t =3 month) distributions for Kymriah in B-ALL and CLL were obtained via simulations of the Stein et al. model 2 (n=1000 simulations, represented as percentiles) and our model (n=12 simulations per group), respectively.Data for Yescarta was digitized from Locke et al. 24 .The distributions shown for Kymriah and Model are exactly as in Figure 5C for the left panel (Cmax).The group of boxplots labelled Model show the Cmax or C(t=3 month) for each of the three populations (CR, blue; PR, grey; and NR, pink) with the colored background of the range of Cmax or C(t=3 month) obtained from the clinical PK data.Color bars for Yescarta show digitized min/max ranges of Cmax and C(t=3 month) while the labels (DR, durable response; RL: relapse; and NR: no response) are plotted at the approximate medians.Boxplots represent median ±25 percentiles, and whiskers the min/max value or an additional 1.5-fold quartile distance.

Figure S17 :
Figure S17: Goodness-of-fit plots to Abecma PKPD data.All data was fit simultaneously, with Pearson's linear correlation coefficient of 0.59 for CAR-T and 0.75 for tumors.

Figure S18 .
Figure S18.Cmax/Tumor burden vs. tumor response simulations for Abecma.The model fit to Abecma phase1 data was simulated at CAR-T doses ranging from 0.1 to 1000 million cells.Tumor shrinkage, compared to untreated control, was calculated at day 60.The response covariate follows the same trend as that observed for Yescarta in DLBCL and predicted for Kymriah.
Given the tumor dynamic profiles look quite similar across variants, we reasoned tumor fits are not important model selection criterion.The most clinically relevant criteria is efficacy, and CAR-T exposure metrics (Cmax and AUC) are known to predict clinical efficacy.We thus evaluated all model variants by MSE of the Cmax (log10-cells) and AUC (log10cells.day) in addition to the MSE of all the data and the sample-size corrected AIC: