Abstract
The use of a standardized outcome metric enhances clinical trial interpretation and cross-trial comparison. If a disease course is predictable, comparing modeled predictions with outcome data affords the precision and confidence needed to accelerate precision medicine. We demonstrate this approach in type 1 diabetes (T1D) trials aiming to preserve endogenous insulin secretion measured by C-peptide. C-peptide is predictable given an individual’s age and baseline value; quantitative response (QR) adjusts for these variables and represents the difference between the observed and predicted outcome. Validated across 13 trials, the QR metric reduces each trial’s variance and increases statistical power. As smaller studies are especially subject to random sampling variability, using QR as the outcome introduces alternative interpretations of previous clinical trial results. QR can provide model-based estimates that quantify whether individuals or groups did better or worse than expected. QR also provides a purer metric to associate with biomarker measurements. Using data from more than 1300 participants, we demonstrate the value of QR in advancing disease-modifying therapy in T1D. QR applies to any disease where outcome is predictable by pre-specified baseline covariates, rendering it useful for defining responders to therapy, comparing therapeutic efficacy, and understanding causal pathways in disease.
Similar content being viewed by others
Introduction
Ideally, a metric for studying disease should be clinically and scientifically meaningful, objective, predictable, and able to be standardized across individuals and cohorts. When applied in the context of clinical trials for any disease, such a standardized metric may enable acceleration of trials through increased statistical power and aid in interpretation of clinical trial data by regulators, clinicians, investigators, translational scientists, and study participants.
If a disease course or outcome is predictable using baseline factors, analysis should adjust for these factors as long as they are specified in advance1. This approach is advantageous over traditional unadjusted analysis, which essentially compares the average of the group of treated individuals to the average of the group of control individuals. Baseline covariate adjustment improves precision for estimating treatment effects of drugs and biological products, and “covariate adjustment leads to efficiency gains when the covariates are prognostic for the outcome of interest in the trial” and are pre-specified in the statistical analysis plan2. In unadjusted analysis, results may be more strongly impacted by chance covariate imbalances at baseline, especially when there is evidence that the covariate is associated with the outcome, obscuring the effect of treatment. Despite the known benefits of baseline-adjusted analyses, reviews have found that only 24–34% of trials use covariate adjustment for the primary analysis3.
A standardized quantitative response (QR) metric that adjusts for baseline covariates can be developed for any reproducible outcome measure for which the natural history is known and predictable. This is the case for trials of disease-modifying therapy (DMT) in type 1 diabetes (T1D) aiming to preserve endogenous pancreatic beta cell function, as there is a wealth of natural history data on the loss of insulin secretion over time measured by the C-peptide response to a mixed meal tolerance test (MMTT)4,5,6,7,8,9,10,11,12,13,14,15,16,17,18. Moreover, though it is noteworthy that a therapy to delay onset of clinically apparent disease was recently approved for clinical use19, there are still no DMTs that preserve endogenous insulin secretion in individuals recently diagnosed with T1D and only one in the prevention setting. Preservation of insulin secretion after diagnosis is associated with improved clinical outcomes20,21,22,23,24. To date, there have been a few dozen randomized controlled trials (RCT) of immune therapy in recently (<3 months) diagnosed T1D and almost all are phase 2 trials led by academic investigators. Further, the time to conduct such studies is long, given that trials are challenging to enroll and study endpoints are at least 1 year from randomization. While C-peptide response to a MMTT is accepted as the appropriate measure of endogenous insulin secretion20,25, regulatory ambiguity for potential indications exists since there is no established clinical therapeutic threshold of C-peptide that definitively qualifies therapies or interventions as successes or failures26. In addition, by design, published trials of immune therapy express the average effect of therapy on the randomized cohort and, though multiple definitions have been proposed, there is no accepted standardized criteria to define a responder to therapy. Together, these issues create limitations in understanding mechanisms of disease and response to therapy, hindering the ability to develop a precision medicine approach to DMT in T1D and other immune-associated diseases.
The QR, originally developed by Bundy and Krischer27, leverages the well-known statistical property that model adjustment with prognostic baseline covariates increases precision and confidence by way of controlling for outcome heterogeneity3. Bundy and Krischer used data from five trials6,7,8,11,12 in similar populations to develop an analysis of covariance (ANCOVA) model to predict the C-peptide area under the curve (AUC) mean value by adjusting for baseline C-peptide AUC mean and age. The resulting QR metric is a standardized measure of the difference between an individual’s observed and predicted C-peptide AUC mean one year after study entry. Values above zero indicate a better-than-expected outcome and values below zero indicate a worse-than-expected outcome27.
Using data from 13 RCTs testing 14 different therapies aiming to preserve beta cell function in those with T1D, we demonstrate how the QR metric increases the precision and confidence of clinical trial results, thus enhancing interpretation of these studies while suggesting new concepts for future trial designs. We found that the QR metric reduced variance and standardized C-peptide outcomes across trials, leading to re-evaluation and new interpretations of both clinical and mechanistic results. In addition, we illustrate how the QR metric may be useful for design of future trials. Together, these findings represent a significant step towards precision medicine.
Results
The QR metric reduces variance and standardizes C-peptide outcomes across trials
We first validated the published QR metric using data from 13 studies: five TrialNet RCTs used in the development of QR (referred to as the development cohort)6,7,8,11,12, and eight additional RCTs (referred to as the validation cohort)9,10,13,14,15,16,17,18. To further evaluate the published model, a new ANCOVA model was also fit to the placebo participants in only the validation cohort. The predicted 1-year C-peptide values from this new model were then used to compute a new QR metric. These newly computed QR values were highly correlated with the QR values from the published model (r = 0.996 in validation cohort, 0.992 in development cohort), and it was thus determined that the published QR model is applicable to all 13 RCTs (Supplemental Fig. 1A and B). Table 1 lists the key characteristics of each of the 13 trials, including number of subjects, median age, and baseline C-peptide AUC mean. As expected, the mean QR in the development cohort centers around zero, this was also seen in the validation cohort. We found that the mean QR matches closely between the development and validation cohorts, and the distribution of QR is similar in both cohorts (Fig. 1; p = 0.43, two-sample t-test [t = 0.8, DF = 346]; p = 0.62, Kolmogorov-Smirnov two-sample test [KS = 0.04]).
We next updated the model to include all available data from all 13 studies. This revised ANCOVA model predicted 1-year C-peptide AUC mean values very similar to those predicted from the published QR model (Supplemental Fig. 1C, D). Moreover, the predicted values between the original and revised models were strongly associated (R2 = 0.998; Supplemental Fig. 1E); thus, it was determined that the original QR model applied well to all 13 studies and all results hereafter use the original QR model. In addition, the model is robust; predictions were counterintuitive for only eight of the 1306 individuals studied. For these eight older individuals with low C-peptide, the model predicted a very minor increase in C-peptide over time (Supplemental Fig. 2).
The use of the QR metric both reduced the variance and standardized the mean of the C-peptide outcome within each trial. Among the placebo-treated individuals in the 13 trials, there were noteworthy variations with respect to both age and baseline C-peptide between studies (Fig. 2a). The 1-year C-peptide AUC mean value varied (mean range 0.36–0.69 nmol/L) between trials, and within each trial demonstrated wide heterogeneity (Fig. 2b). In contrast, by accounting for baseline C-peptide and age, the mean QR value of placebo-treated individuals centered around zero for each trial (mean range −0.07 to 0.08) (Fig. 2b). For 12 of these 13 trials, the mean QR value was not statistically different from zero; the only exception being the TrialNet ATG/GCSF trial (mean of −0.072, p = 0.031; 30 placebo participants). For each individual trial, the standard deviation of the QR was lower than the standard deviation using the C-peptide AUC mean (Fig. 2b, annotated in blue).
We next investigated whether the QR metric would increase statistical power since covariate adjustment in randomized trials leads to greater power and better control of type I and type II error3. We chose to examine this in the Immune Tolerance Network (ITN) AbATE trial of teplizumab, which had a positive outcome in demonstrating efficacy of teplizumab in preserving beta cell function in recently diagnosed individuals relative to controls13. In the original analysis, the primary outcome used the difference in 4-h C-peptide AUC mean between baseline and 2 years, adjusted for baseline C-peptide, with a p-value of 0.002 for the difference between treatment groups. In our re-analysis, we used the 2-h C-peptide AUC mean at 1 year and found that the difference in C-peptide AUC mean in control (0.364 nmol/L) compared to teplizumab-treated (0.647 nmol/L) at 1 year was statistically significant with a p-value of 0.009 (t = 2.7, DF = 46.4) using a two-sample t-test assuming unequal group variances (Fig. 3a). When using the change from baseline C-peptide AUC mean as in the published trial report, the effect size and precision increased with a mean change from baseline of −0.321 nmol/L in the control group and −0.086 nmol/L in the teplizumab-treated group (p = 0.0002, t = 4.0, DF = 57.0) (Fig. 3b). Controlling for both baseline C-peptide and age by using QR as the outcome further increased the statistical significance of the result (−0.015 nmol/L control vs 0.141 nmol/L teplizumab-treated, p < 0.0001, t = 4.3, DF = 47.3) (Fig. 3c).
We next assessed whether the QR approach could be utilized to predict trial outcomes beyond 1 year, using varying baseline reference points and outcome timepoints. The original QR model used baseline values from individuals within three months of diagnosis to predict the outcome at 12 months after treatment initiation and demonstrated an R2 value of 53%. Predictions further in the future are more challenging, thus it is not surprising that the R2 drops when this same baseline is used to predict a 24-month outcome (Fig. 4). However, if individuals are enrolled at 6, 12, or 18 months from diagnosis, the R2 value for an outcome at two years is high (74%, 85%, 87%, respectively), suggesting that QR can be used as an outcome measure for trials enrolling individuals further from diagnosis. Since most trials evaluating immune therapies have been conducted in individuals within 3 months of diagnosis, this suggests the possibility of trials designed to test therapies with different enrollment windows.
Applying the QR metric to previous published clinical trials can change the interpretation of both clinical and mechanistic results
Since the QR metric incorporates historical data from many placebo/control individuals, it minimizes the random sampling variability often present in individual studies with small sample sizes. We determined whether using the QR metric would alter the interpretation of clinical trial results, compared to the originally published reports. In Fig. 5, we show the mean QR (±95% confidence intervals) for the active treatments (Fig. 5a) and placebo/control arms (Fig. 5b), as well as the treatment effect expressed as the difference in QR between arms (Fig. 5c) for 13 published trials.
Expressing the overall treatment effect and results of each arm using QR altered the interpretation of some of the published results. For example, the primary outcome of the alefacept trial was the 2-h C-peptide AUC at 1 year, and the difference between treatment arms did not reach statistical significance in the original analysis14. However, applying the QR metric to the alefacept trial dataset demonstrated a large effect in the active treatment group (Fig. 5a), strongly suggesting that alefacept, or drugs working in the same pathway, are worth pursuing in future trials.
For the canakinumab trial, assessing the overall trial result by the difference in treatment arms using the QR metric finds no treatment effect (Fig. 5c), consistent with the published outcome8. Yet, the QR of the active arm of the canakinumab trial suggests a positive effect of this therapy on C-peptide (Fig. 5a). Moreover, the QR of the placebo arm allowed us to further interpret this result, revealing that the lack of statistical significance between the groups may be driven by higher-than-expected C-peptide response in the 22 individuals in the placebo arm of the study (Fig. 5b).
Lastly, applying the QR metric to the two studies testing anti-thymocyte globulin (ATG) also suggests a different interpretation than originally published. The TrialNet ATG/GCSF trial reported a positive outcome9 using lower-dose therapy than the ITN ATG study, which did not meet its primary outcome28, leading to the interpretation that dose level was the key variable in the effectiveness of the drug. However, the mean QR in the treated participants was notably similar between studies: mean (95% CI) QR was 0.08 (0.02–0.13) in the higher-dose ITN ATG trial, 0.09 (0.03–0.15) in the lower-dose TrialNet ATG/GCSF trial. This suggests that the reported difference in treatment effect between the studies was driven by the placebo participants: those in the TrialNet ATG/GCSF study had a fairly low mean QR (−0.07) while those in the ITN ATG study had a higher mean QR (0.05). Of note, across the 13 studies, only the TrialNet ATG/GCSF control arm was statistically significantly different from zero. This analysis suggests that rather than drug dosing, the differing behavior of the placebo groups could alternatively explain differences in reported trial outcomes. This further demonstrates how random sampling variability in smaller studies can complicate interpretation of RCT results.
We also looked at the applicability of the QR model to timepoints prior to 1 year. If the mean QR ± 95% CI were used as an outcome at 6 months, we found no treatments that showed a false positive result—that is, all trials positive at 6 months were still positive at 12 months. However, the converse is not true; we show that three therapies positive at 12 months could have been missed using this method (Fig. 6).
Next, we investigated whether using the QR metric would impact the results of immune marker studies aiming to explore mechanisms of response to therapy. Using C-peptide and mechanistic results obtained from the ITN AbATE trial of teplizumab, we confirmed previous reports of the positive association between the frequency of treatment-induced KLRG1 + TIGIT + CD8 + T cells, a known signature of T cell exhaustion, and C-peptide outcome (Fig. 7a)29. Adjusting for baseline C-peptide and age by using the QR metric showed a weaker association between treatment-induced KLRG1 + TIGIT + CD8 + T cells and outcome (Fig. 7b). This observation is likely accounted for in part by an association between treatment-induced exhausted T cells and age (Fig. 7c). As previously noted, age is one of two key variables in the QR metric; age is also known to be important in defining setpoints and responsiveness for many immune cell populations (recently reviewed in refs. 30,31,32,33). This analysis implies that therapy-induced exhaustion of T cells unveils mechanistic insights about age itself, which may or may not be causally related to a particular therapy, but is important to our understanding of the role age plays in disease progression and response to therapy. Supplemental Fig. 3 graphically illustrates why QR is a more powerful metric for identification of a biomarker that is causally related to therapy.
The QR metric better quantifies responders to therapy
As in other diseases, not all individuals recently diagnosed with T1D will respond to a given therapy. Although continuous measurements are preferred to minimize loss of statistical power, historically, analyses of clinical trial results across many diseases frequently stratify treated individuals as responders and non-responders to therapy. In T1D trials, varying definitions of response to therapy using C-peptide have been used13,34,35. Reasoning that previously published responder definitions may be associated with baseline variables, we investigated whether the standardized QR metric could better quantify responders to therapy.
We first explored the relationship between baseline C-peptide, age, and the previously published categories of a C-peptide responder/non-responder13,34,35. As shown in Fig. 8, among placebo/control participants, the probability of meeting each of the four responder definitions is strongly associated with age (Fig. 8 panels a [Likelihood Ratio (LR) = 11.5, p = 0.0007], b [LR = 11.3, p = 0.0008], c [LR = 19.8, p < 0.0001], d [LR = 11.5, p = 0.0007]); two of these definitions are also associated with baseline C-peptide (Fig. 8a [LR = 5.1, p = 0.02] and c [LR = 5.4, p = 0.02]). In contrast, the probability of being a responder using the QR-based definition of above or below zero, is, as expected, not associated with either age (LR = 1.8, p = 0.18) or baseline C-peptide (LR = 0.2, p = 0.62) (Fig. 8e).
To further illustrate the consequences of not accounting for baseline variables in classifying responders, we benchmarked the probability of being a responder for each definition. The average age (16.4 years) and average baseline C-peptide AUC mean for treated individuals across all 13 studies was determined, yielding a QR value of 0.039. However, the probability that an individual with these characteristics is defined as a treatment responder varies widely using different responder definitions (Fig. 8b, c). Most importantly, the probability of being a treatment responder is strongly associated with age, baseline C-peptide, or both of these metrics for all non-QR definitions. Since the QR metric adjusts for age and baseline C-peptide, the probability of being a responder is not conditional on these factors, as can also be seen from the annotated p-values (Fig. 8e).
Given that the probability of being a treatment responder is not conditional on age and baseline C-peptide, we asked how the QR metric could be used to select a threshold for classifying responders and non-responders to therapy. In selecting a threshold for a continuous measure such as QR, it is useful to understand the variability or confidence intervals around a given QR value, reflecting the probability that a given QR value represents a true treatment responder. Here, we observed that the distribution of the QR scores of all placebo/control individuals is symmetrical (Fig. 9a), leading to quantile statistics whereby the QR value can be assigned to a percentile (e.g., a QR of 0.10 corresponds to the 75th percentile of the distribution). Figure 9a also illustrates that while there is a symmetrical distribution around the mean, heterogeneity is also apparent; a placebo-treated individual may have a QR value ranging from −0.58 to 0.45. Similarly, though the mean QR value among all individuals in the treatment arms of the positive studies is above zero, there is also a wide range of values in each treated group (Fig. 9b), many of which overlap with the distribution of placebo-treated individuals.
The overlapping QR values between treatment and placebo groups demonstrate that using a particular QR cutoff will not necessarily distinguish individuals who received an efficacious therapy from placebo individuals. We reasoned that these distributions can be used to understand the probability that a specific QR value is associated with a successful treatment (Fig. 9c). As shown in Fig. 9c, the probability of identifying a treatment responder or non-responder increases at the extremes of the distribution. For example, the probability that an individual with QR = 0.4 received an effective therapy exceeds 80% (Fig. 9a). Conversely, the probability that an individual with QR = −0.4 received an effective therapy is only 15%: at this threshold, individuals are more likely to be placebo participants, and thus we can infer that a treated individual with this QR value was likely a non-responder to therapy. Choosing less extreme cut points introduces greater uncertainty. For example, selecting a threshold of treatment response of QR = 0.2 would yield only a 65% probability that this individual received an effective therapy.
Discussion
Despite decades of clinical trials of DMT in individuals recently diagnosed with T1D, there are no drugs currently in clinical practice. Here, we have demonstrated that the QR metric may address many challenges to the field, facilitating the identification of potentially effective therapies. Importantly, standardization of outcomes enables a uniform method of analysis across trials, and thus a manner for comparing therapies and identifying responders to therapies through a consistent responder definition.
We applied the QR metric, which adjusts for baseline age and C-peptide AUC mean, to data from 13 clinical trials of DMT in individuals with recently diagnosed T1D. Since these 13 trials occurred over a 10-year period, included individuals from 3 to 46 years of age, were conducted at multiple locations, and included data from both academic trials and a phase 3 industry-sponsored study, the strength of the model is sufficiently robust to be considered for regulatory purposes.
Whereas traditional unadjusted analysis may be impacted by chance imbalances in covariates at baseline (especially those known to be associated with outcome), baseline-adjusted analysis can lead to individual-specific (conditional) estimates which conceptually match individuals in the intervention group and control group who are similar with respect to the adjusted variables. Using pre-specified variables, baseline-adjusted analysis increases statistical power, allowing for robust comparisons between studies.
In T1D, more than half of the heterogeneity in the natural history of disease can be explained by age and baseline C-peptide. While several T1D trials used ANCOVA models adjusted for baseline metrics, this was inconsistent between studies. Computing a QR further utilizes those ANCOVA predictions27,36 to determine a standardized score, which enables cross-trial analysis. Analyzing treatment effects in terms of QR also allows for evaluation of treatment groups in a standardized manner, with comparisons to a large number of controls. This approach is particularly powerful in T1D since the baseline covariates are established predictors of the outcome.
Using data from the ITN trial of teplizumab, we demonstrate that using the QR metric reduces the variance of the outcome, resulting in increased power and the potential for reducing sample size. However, it is not clear that reducing the sample size for a phase 2 RCT is the optimal approach to select promising therapies, or to identify responders for T1D or other diseases. Placebo-controlled randomized trials have clear advantages, as randomization can account for potential differences in variables that are known to impact outcome. However, when small sample sizes are used, random sampling variation can significantly impact inferences about trial outcomes. In the case of trials of DMT aiming to preserve C-peptide, the known factors are baseline C-peptide and age; QR adjusts for these factors.
Using data from almost 500 control/placebo individuals in the 13 trials studied, we show that untreated individuals’ outcomes are reliably predicted. Utilizing QR as an outcome measure implies comparison of a treatment arm to this large number of historical controls, and could allow trialists to consider studies without a contemporaneous control group in early phase trials, for example when use of a placebo is not feasible. Single- or multiple-active arm phase 2 trials are likely to conserve resources by eliminating or minimizing placebo participants while accelerating recruitment (as more participants agree to trials without a placebo arm). Of course, caution is always needed in interpretation of studies without contemporaneous controls. However, we suggest that with these caveats in mind, data garnered from single arm trials could inform decisions about whether a therapeutic approach merits further testing in subsequent gold standard phase 3 placebo-controlled randomized trials37,38.
Furthermore, using a QR outcome allows for adaptive study designs of multiple active agents, as we found that a mean QR value above zero in the treatment arm at six months after randomization completely predicted the success of all the tested RCTs with a positive outcome at 1 year. A trial with multiple active agents could drop ineffective therapeutic arms at six months. Using QR could enable shorter clinical trials, which would reduce burden on study staff and participants, reduce cost, and reduce the time that participants in the active arm are exposed to ineffective therapies. By pre-specifying a QR threshold of interest at an early timepoint, adaptive re-randomization designs, such as the sequential parallel comparison design (SPCD; Supplemental Fig. 4)39, are also feasible. This design identifies placebo participants with a QR value below zero early in a trial, and re-randomizes those individuals to treatment or placebo, allowing a larger number of participants to potentially benefit from therapy. QR can also enable enrollment of individuals outside the traditionally used period of 3 months post-diagnosis in new-onset T1D trials, as it can reliably predict a 2-year outcome when a baseline timepoint is >6 months from diagnosis.
Assessing treatment effect as the difference in QR between treatment and control arms may alter interpretation of prior trial results. In addition, it can aid in prioritizing therapies for further study. Since there were similarities between the 13 trials with respect to baseline C-peptide and age, and many of the trials used baseline-adjusted ANCOVA models, it is not surprising that using QR to express the trial result is similar to that seen in published reports; that is, the teplizumab, abatacept, rituximab, and low dose ATG trials all demonstrate that the QR of the actively treated group is higher than that of the control group. However, while the published primary outcome of the ITN alefacept trial was negative, when considering the outcomes of the active treatment arms for each trial, teplizumab and alefacept both stand out as therapies with the greatest QR values, which suggests both therapies (or similar drugs) are worth pursuing. Furthermore, while the published results of two trials using ATG differ (ITN higher dose trial being negative and TrialNet lower dose trial being positive), the QR point estimates of the active arms in each of these trials are similar, indicating that the differences in clinical trial outcomes reported were perhaps impacted by differences in the placebo arms rather than differences in efficacy between the two doses.
The canakinumab trial exemplifies the risks of comparison to a small control cohort. The originally reported canakinumab negative trial result had a detrimental impact on future studies; despite pre-clinical and mechanistic data suggesting a role of IL-1 in T1D40,41,42,43, there has been reluctance to test this type of therapy further. However, we show here that the QR of individuals in the canakinumab-treated arm was positive, suggesting therapeutic effect. Our interpretation of the data indicates that the negative result in the originally published trial was due to the small number of placebo participants who performed much better than expected. Of note, in a retrospective analysis of the original study, Bundy et al. addressed this issue by comparing canakinumab-treated individuals to a larger placebo group, and also concluded that canakinumab may be effective36. While not negating the results of the original RCT, the integration of large amounts of historical data here provides added context to robustly interpret studies.
Perhaps the most powerful use of QR is its ability to determine the extent to which an individual responded to therapy. Participants are typically informed of clinical trial results with information about their own insulin secretion and the mean values for treatment and control groups. However, explaining what this means can be challenging as the relationship between a given C-peptide value or change in C-peptide with long term clinical outcomes is not known. QR, in contrast, allows for standardized, subject-specific estimates to be provided to each participant; study staff can describe the probability that the participant did better or worse than expected while on treatment (i.e., a responder or non-responder).
QR is also an improvement over previously used responder/non-responder definitions. Incorporating historical data via QR provides a greater level of certainty when identifying treatment responders. Standardized estimates based on historical placebo data can be used to understand the probability of observing a specific QR value in the absence of a treatment effect. Higher QR values are associated with increased confidence that an individual’s response is related to treatment. In the absence of a QR framework, we would be less certain about these predictions at both the individual and group level. In trials with two or more active treatments with differing mechanisms of action, the QR can be used as a standardized instrument to discriminate biomarkers hypothesized to be causally related to treatment with the objective of personalizing immune therapies to specific endotypes44.
QR is particularly useful for a more principled analysis of mechanistic data seeking to explain whether a mechanistic marker lies in the hypothesized causal pathway for the therapy. This concept is exemplified by analysis of exhausted T cells in individuals treated with teplizumab. The increase in these cells post-therapy is associated with C-peptide, and there is also a relationship between T cell exhaustion and age, consistent with previous reports45,46,47. This suggests that understanding the causal pathway between teplizumab therapy and the induction of exhausted T cells must consider age as a factor48, while also helping the field to consider the general phenomenon of why children may be more likely to respond to therapy.
Using data from 13 clinical trials and more than 1,300 participants, we demonstrate the significant value of using QR to advance the field of DMT in T1D. Our study serves as an example for applying the QR approach in other diseases that lack clarity in defining responders to therapy, for comparing the effectiveness of different therapies, and for understanding causal pathways in disease. Our analysis shows that the QR metric of insulin secretion measured by C-peptide is clinically and scientifically meaningful, objective, predictable, and standardized across individuals and cohorts, thus accelerating and aiding in interpretation of trials and providing a framework for precision medicine in T1D.
Methods
Datasets
De-identified data were obtained from 13 clinical trials of DMT in individuals with recently diagnosed T1D (Table 1). These include six studies conducted by Diabetes TrialNet (TrialNet.org), an NIH-sponsored clinical trial network: MMF/DZB (TN02 NCT0010017812), Rituximab (TN05 NCT0027930511), GAD-Alum (TN08 NCT005293997), Abatacept (TN09 NCT005053756), Canakinumab (TN14 NCT009474278), Low Dose ATG/GCSF (TN19 NCT022152009); four studies conducted by the NIH funded Immune Tolerance Network (ITN; Immunetolerance.org): Teplizumab (AbATE NCT0206792313), Alefacept (T1DAL NCT0096545814), ATG (START NCT0051509918), Tocilizumab (EXTEND NCT0229383715); one investigator-initiated study sponsored by JDRF, Imatinib/Gleevec (NCT0178197510); and two industry led studies, Diamyd Therapeutics AB: Phase 2 (NCT0043598117) and Phase 3 (NCT0072341116) GAD-Alum.
Statistical methods
The ANCOVA model developed by Bundy and Krischer27, using data from recently diagnosed T1D individuals, QRi = ln(Cp1year,i + 1) – 0.812⋅ln(Cp0,i + 1) – 0.00638⋅Agei + 0.191, was used to compute the individual’s QR, where Cp0,i and Cp1year,i represent 2-hour C-peptide AUC mean (AUC divided by 120 min, in nanomoles per liter) at baseline and one year post treatment, respectively; Agei is the age at study entry, in years. Since the model assumes a linear relationship between baseline and 1 year C-peptide, we additionally computed QR at 3, 6, and 9 months post-randomization by deriving the expected C-peptide values at these timepoints from the original QR equation, and determining the difference between the expected and observed values at each timepoint. Other variables were considered for inclusion in the original model and were found not to improve model fit27,36.
To validate the QR method, we tested the model performance by applying the published ANCOVA model to data from eight new studies not used for the development of QR (Fig. 1). The Kolmogorov-Smirnov two-sample test was used to compare QR distributions between the development (n = 5 studies) and validation (n = 8 studies) cohorts. In addition, an ANCOVA model was developed using all control participants to assess the association between actual 1 year C-peptide AUC mean values and predicted values from both the formula derived from the ANCOVA developed from our dataset and the published QR formula.
Participants were classified as either active treatment or placebo/control. Two-sample, two-sided t-tests were used for comparison of means between groups. For responder analyses, participants were dichotomized based on historical thresholds from the literature used to define responders and non-responders, and using a QR responder definition, where responders are individuals with positive QR and non-responders are individuals with negative QR. Generalized linear models with a binomial distribution and a logit link were fit among placebo/control participants, to each responder definition with adjustments for baseline age and baseline C-peptide. For biomarker analyses, Pearson correlations were computed to examine the association of KLRG1 + TIGIT + CD8 + T cells with baseline metrics (age and C-peptide), and with outcome metrics (QR and C-peptide).
Using control group data, additional ANCOVA models were developed to expand the utility of the QR method to different time intervals. Specifically, post-baseline predictions ranging from 3 months to 2 years were created using different baseline reference points and prediction horizons. Analysis was performed using SAS software version 9.4 (SAS Institute Inc., Cary, NC, USA) and JMP Pro 16 (SAS Institute Inc., Cary, NC, USA).
SAS code utilization
SAS and JSL code are provided at https://github.com/BenaroyaResearch/qr_t1d_metric/. SAS code computes QR and model-predicted C-peptide values at various timepoints using the QR model27, fits ANCOVA models following the same methodology as the published QR model, and runs t-tests to determine treatment effect using QR. JSL code fits generalized linear models to examine the association between QR and treatment group, and association of baseline metrics (C-peptide and age) with responder status, where responder status determines QR ≥ 0 as responder and QR < 0 as non-responder. A test dataset (SAS dataset) is also provided, which is a one record per subject dataset, including C-peptide AUC mean from 2-h MMTT at all available timepoints (at minimum, baseline and 1 year required), age at screening in years, and treatment group and study. An example output dataset of computed QR and predicted C-peptide values is also provided (SAS and JMP datasets). These codes have been validated and run by multiple analysts and applied to other datasets; they have not been submitted to community commenting.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
All clinical trial source data supporting the findings described in this manuscript are available under controlled access due to data privacy laws. TrialNet clinical trials data are publicly available and can be obtained by application to the NIDDK Central Repository at https://repository.niddk.nih.gov/home/. Immune Tolerance Network clinical trials data are also publicly available at https://www.itntrialshare.org/. Data from the Imatinib study are available from Dr. Stephen Gitelman (stephen.gitelman@ucsf.edu) per data sharing statements from the original publication10. Data from the Diamyd Medical GAD-alum phase 2 and phase 3 trials are available upon reasonable request via a data transfer agreement. Requests should be addressed to Anton Lindqvist at anton.lindqvist@diamyd.com.
Code availability
SAS code supporting all analyses are available at GitHub https://github.com/BenaroyaResearch/qr_t1d_metric/. SAS code is also available from the corresponding author on reasonable request.
References
Beach, M. L. & Meier, P. Choosing covariates in the analysis of clinical trials. Control Clin. Trials 10, 161S–175S (1989).
Food and Drug Administration. Vol. Revision 1 (ed Center for Drug Evaluation and Research; Center for Biologics Evaluation and Research) (2021).
Kahan, B. C., Jairath, V., Dore, C. J. & Morris, T. P. The risks and rewards of covariate adjustment in randomized trials: an assessment of 12 outcomes from 8 studies. Trials 15, 139 (2014).
von Herrath, M. et al. Anti-interleukin-21 antibody and liraglutide for the preservation of beta-cell function in adults with recent-onset type 1 diabetes: a randomised, double-blind, placebo-controlled, phase 2 trial. Lancet Diabetes Endocrinol. 9, 212–224 (2021).
Quattrin, T. H. et al. T1GER Study Investigators Golimumab and beta-cell function in youth with new-onset type 1 diabetes. NEJM 383, 2007–2017 (2020).
Orban, T. et al. Co-stimulation modulation with abatacept in patients with recent-onset type 1 diabetes: a randomised, double-blind, placebo-controlled trial. Lancet 378, 412–419 (2011).
Wherrett, D. K. et al. Antigen-based therapy with glutamic acid decarboxylase (GAD) vaccine in patients with recent-onset type 1 diabetes: a randomised double-blind trial. Lancet 378, 319–327 (2011).
Moran, A. et al. Interleukin-1 antagonism in type 1 diabetes of recent onset: two multicentre, randomised, double-blind, placebo-controlled trials. Lancet 381, 1905–1915 (2013).
Haller, M. J. et al. Low-dose anti-thymocyte globulin (ATG) preserves beta-cell function and improves HbA1c in new-onset type 1 diabetes. Diabetes Care 41, 1917–1925 (2018).
Gitelman, S. E. et al. Imatinib therapy for patients with recent-onset type 1 diabetes: a multicentre, randomised, double-blind, placebo-controlled, phase 2 trial. Lancet Diabetes Endocrinol. 9, 502–514 (2021).
Pescovitz, M. D. et al. Rituximab, B-lymphocyte depletion, and preservation of beta-cell function. N. Engl. J. Med. 361, 2143–2152 (2009).
Gottlieb, P. A. et al. Failure to preserve beta-cell function with mycophenolate mofetil and daclizumab combined therapy in patients with new- onset type 1 diabetes. Diabetes Care 33, 826–832 (2010).
Herold, K. C. et al. Teplizumab (anti-CD3 mAb) treatment preserves C-peptide responses in patients with new-onset type 1 diabetes in a randomized controlled trial: metabolic and immunologic features at baseline identify a subgroup of responders. Diabetes 62, 3766–3774 (2013).
Rigby, M. R. et al. Targeting of memory T cells with alefacept in new-onset type 1 diabetes (T1DAL study): 12 month results of a randomised, double-blind, placebo-controlled phase 2 trial. Lancet Diabetes Endocrinol. 1, 284–294 (2013).
Greenbaum, C. J. et al. IL-6 receptor blockade does not slow beta cell loss in new-onset type 1 diabetes. JCI Insight 6, https://doi.org/10.1172/jci.insight.150074 (2021).
Ludvigsson, J. et al. GAD65 antigen therapy in recently diagnosed type 1 diabetes mellitus. N. Engl. J. Med. 366, 433–442 (2012).
Ludvigsson, J. et al. GAD treatment and insulin secretion in recent-onset type 1 diabetes. N. Engl. J. Med. 359, 1909–1920 (2008).
Gitelman, S. G. et al. Antithymocyte globulin treatment for patients with recent-onset type 1 diabetes: 12-month results of a randomised, placebo-controlled, phase 2 trial. Lancet Diabetes Endocrinol. 1, 306–316 (2013).
Herold, K. C. et al. An anti-CD3 antibody, teplizumab, in relatives at risk for type 1 diabetes. N. Engl. J. Med. 381, 603–613 (2019).
Palmer, J. P. et al. C-peptide is the appropriate outcome measure for type 1 diabetes clinical trials to preserve beta-cell function: report of an ADA workshop, 21-22 October 2001. Diabetes 53, 250–264 (2004).
Baidal, D. A. et al. Predictive Value of C-Peptide Measures for Clinical Outcomes of beta-Cell Replacement Therapy in Type 1 Diabetes: Report From the Collaborative Islet Transplant Registry (CITR). Diabetes Care 46, 697–703 (2023).
Rickels, M. R. et al. High residual C-peptide likely contributes to glycemic control in type 1 diabetes. J. Clin. Invest. 130, 1850–1862 (2020).
Jeyam, A. et al. Clinical impact of residual C-peptide secretion in type 1 diabetes on glycemia and microvascular complications. Diabetes Care 44, 390–398 (2021).
Gubitosi-Klug, R. A. et al. Residual beta cell function in long-term type 1 diabetes associates with reduced incidence of hypoglycemia. J. Clin. Invest. 131, https://doi.org/10.1172/JCI143011 (2021)
Greenbaum, C. J. et al. Mixed-meal tolerance test versus glucagon stimulation test for the assessment of beta-cell function in therapeutic trials in type 1 diabetes. Diabetes Care 31, 1966–1971 (2008).
https://c-path.org/design-of-clinical-trials-in-new-onset-type-1-diabetes-regulatory-considerations-for-drug-development/. Design of Clinical Trials in New-Onset Type 1 Diabetes: Regulatory Considerations for Drug Development, 2021).
Bundy, B. N., Krischer, J. P. & Type 1 Diabetes TrialNet Study Group. A quantitative measure of treatment response in recent-onset type 1 diabetes. Endocrinol. Diabetes Metab. 3, e00143 (2020).
Gitelman, S. E. et al. Antithymocyte globulin therapy for patients with recent-onset type 1 diabetes: 2 year results of a randomised trial. Diabetologia 59, 1153–1161 (2016).
Long, S. A. et al. Partial exhaustion of CD8 T cells and clinical response to teplizumab in new-onset type 1 diabetes. Sci. Immunol. 1, https://doi.org/10.1126/sciimmunol.aai7793 (2016).
Mogilenko, D. A., Shchukina, I. & Artyomov, M. N. Immune ageing at single-cell resolution. Nat. Rev. Immunol. 22, 484–498 (2022).
Davenport, M. P., Smith, N. L. & Rudd, B. D. Building a T cell compartment: how immune cell development shapes function. Nat. Rev. Immunol. 20, 499–506 (2020).
Shaw, A. C., Goldstein, D. R. & Montgomery, R. R. Age-dependent dysregulation of innate immunity. Nat. Rev. Immunol. 13, 875–887 (2013).
Roe, K. NK-cell exhaustion, B-cell exhaustion and T-cell exhaustion-the differences and similarities. Immunology 166, 155–168 (2022).
Herold, K. C. et al. A single course of anti-CD3 monoclonal antibody hOKT3gamma1(Ala-Ala) results in improvement in C-peptide responses and clinical parameters for at least 2 years after onset of type 1 diabetes. Diabetes 54, 1763–1769 (2005).
Hao, W. et al. Fall in C-peptide during first 4 years from diagnosis of type 1 diabetes: variable relation to age, HbA1c, and insulin dose. Diabetes Care 39, 1664–1670 (2016).
Bundy, B. N., Krischer, J. P. & Type 1 Diabetes TrialNet Study Group. A model-based approach to sample size estimation in recent onset type 1 diabetes. Diabetes Metab. Res Rev. 32, 827–834 (2016).
Viele, K. et al. Use of historical control data for assessing treatment effects in clinical trials. Pharm. Stat. 13, 41–54 (2014).
Freidlin, B. & Korn, E. L. Augmenting randomized clinical trial data with historical control data: precision medicine applications. J. Natl Cancer Inst. https://doi.org/10.1093/jnci/djac185 (2022)
Fava, M., Evins, A. E., Dorer, D. J. & Schoenfeld, D. A. The problem of the placebo response in clinical trials for psychiatric disorders: culprits, possible remedies, and a novel study design approach. Psychother. Psychosom. 72, 115–127 (2003).
Wang, X. et al. Identification of a molecular signature in human type 1 diabetes mellitus using serum and functional genomics. J. Immunol. 180, 1929–1937 (2008).
Cabrera, S. M. et al. Interleukin-1 antagonism moderates the inflammatory state associated with Type 1 diabetes during clinical trials conducted at disease onset. Eur. J. Immunol. 46, 1030–1046 (2016).
Padgett, L. E., Broniowska, K. A., Hansen, P. A., Corbett, J. A. & Tse, H. M. The role of reactive oxygen species and proinflammatory cytokines in type 1 diabetes pathogenesis. Ann. N. Y. Acad. Sci. 1281, 16–35 (2013).
Mandrup-Poulsen, T., Pickersgill, L. & Donath, M. Y. Blockade of interleukin 1 in type 1 diabetes mellitus. Nat. Rev. Endocrinol. 6, 158–166 (2010).
Battaglia, M. et al. Introducing the endotype concept to address the challenge of disease heterogeneity in type 1 diabetes. Diabetes Care 43, 5–12 (2020).
Goronzy, J. J. & Weyand, C. M. Successful and maladaptive T cell aging. Immunity 46, 364–378 (2017).
Song, Y. et al. T-cell immunoglobulin and ITIM domain contributes to CD8(+) T-cell immunosenescence. Aging Cell 17, https://doi.org/10.1111/acel.12716 (2018).
Soto-Heredero, G., Gomez de Las Heras, M. M., Escrig-Larena, J. I. & Mittelbrunn, M. Extremely differentiated T cell subsets contribute to tissue deterioration during aging. Annu. Rev. Immunol. 41, 181–205 (2023).
Pearl, J. & Mackenzie, D. The Book of Why: The New Science of Cause and Effect. First edn., (Basic Books, 2018).
Acknowledgements
The authors gratefully acknowledge the access to data from Diamyd trials of GAD-alum from Diamyd Medical and for the Imatinib trial from Principal Investigator Stephen Gitelman, UCSF. Publicly available data was obtained from trials conducted by the Immune Tolerance Network (an international clinical research consortium headquartered at the Benaroya Research Institute and supported by the National Institute of Allergy and Infectious Diseases and JDRF) and from The Type 1 Diabetes TrialNet Study Group. We also thank Anne Hocking and Taylor Lawson (BRI) for thoughtful comments on this manuscript. This analysis was funded in part by JDRF grant 3-SRA-2019-791-S-B (to C.S.) and NIDDK grant 5R03DK127475-02 (to C.S.). TrialNet is a clinical trials network funded by the National Institutes of Health (NIH) through the National Institute of Diabetes and Digestive and Kidney Diseases, the National Institute of Allergy and Infectious Diseases, and The Eunice Kennedy Shriver National Institute of Child Health and Human Development, through the cooperative agreements U01 DK061010, U01 DK061034, U01 DK061042, U01 DK061058, U01 DK085461, U01 DK085465, U01 DK085466, U01 DK085476, U01 DK085499, U01 DK085509, U01 DK103180, U01 DK103153, U01 DK103266, U01 DK103282, U01 DK106984, U01 DK106994, U01 DK107013, U01 DK107014, UC4 DK106993, UC4DK117009. The funders had no role in the conceptualization, design, data collection, analysis, decision to publish, or preparation of the manuscript.
Author information
Authors and Affiliations
Contributions
Conceptualization: H.T.B., C.J.G., and C.S. Data curation: A.Y. and H.T.B. Formal analysis: A.Y., H.T.B., and C.O. Visualization: A.Y. and H.T.B. Funding acquisition: C.J.G. and C.S. Writing—original draft: C.J.G., A.Y., H.T.B., and C.S. Writing—review & editing: C.J.G., A.Y., H.T.B., C.O., S.L., and C.S.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Ylescupidez, A., Bahnson, H.T., O’Rourke, C. et al. A standardized metric to enhance clinical trial design and outcome interpretation in type 1 diabetes. Nat Commun 14, 7214 (2023). https://doi.org/10.1038/s41467-023-42581-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-023-42581-z
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.