Enhanced Risk Stratification for Children and Young Adults with B-Cell Acute Lymphoblastic Leukemia: A Children’s Oncology Group Report

Current strategies to treat pediatric acute lymphoblastic leukemia rely on risk stratification algorithms using categorical data. We investigated whether using continuous variables assigned different weights would improve risk stratification. We developed and validated a multivariable Cox model for relapse-free survival (RFS) using information from 21199 patients. We constructed risk groups by identifying cutoffs of the COG Prognostic Index (PICOG) that maximized discrimination of the predictive model. Patients with higher PICOG have higher predicted relapse risk. The PICOG reliably discriminates patients with low vs. high relapse risk. For those with moderate relapse risk using current COG risk classification, the PICOG identifies subgroups with varying 5-year RFS. Among current COG standard-risk average patients, PICOG identifies low and intermediate risk groups with 96% and 90% RFS, respectively. Similarly, amongst current COG high-risk patients, PICOG identifies four groups ranging from 96% to 66% RFS, providing additional discrimination for future treatment stratification. When coupled with traditional algorithms, the novel PICOG can more accurately risk stratify patients, identifying groups with better outcomes who may benefit from less intensive therapy, and those who have high relapse risk needing innovative approaches for cure.


INTRODUCTION
Outcomes among children with acute lymphoblastic leukemia (ALL) have steadily improved, and event-free and overall survival (OS) now exceed 85% and 90% [1].Therapy for ALL is determined using established risk factors, balancing treatment intensity with prognosis to minimize overtreating patients with favorable risk, and undertreating patients with higher risk.
Assigning differing weights to individual risk factors or using continuous numerical rather than categorical values may more accurately predict relapse risk.The UKALL group used MRD as a continuous variable to develop a prognostic model that generated a continuous score (prognostic index UKALL , PI UKALL ) predicting patient-level relapse risk [4,5].This model incorporated favorable and unfavorable genetics, and both presenting WBC and D29 MRD as continuous variables.An increase in the PI UKALL score was strongly associated with relapse risk in a validation cohort of three European pediatric ALL trials (combined n = 2313) [5].
We conducted an external validation of the PI UKALL in >20000 COG trial participants and subsequently assessed the value of D8 MRD added to this model given our prior work showing the prognostic value of D8 MRD in certain patient subsets [6].In contrast to the UK group, COG conducts different B-and T-ALL trials [7].Thus, we focused on B-ALL and developed a novel risk score, PI COG , and compared patient outcomes between current and PI COG -derived RGs.

Study population
The cohort included 13,875 NCI standard-risk (SR) and 7324 NCI high-risk (HR) non-infant B-ALL patients enrolled on four COG trials from 2004-2019; two for SR and two for HR patients: AALL0331 (SR; n = 5099) [8], AALL0232 (HR; n = 2900) [9], AALL0932 (SR; n = 8776) [10], and AALL1131 (HR; n = 4424) [3,11].Patients and/or their caregiver(s) provided informed consent for these trials in accordance with the NIH central IRB and the Declaration of Helsinki.Randomizations differed for each trial.In all trials except AALL0232, primary analyses indicated no statistical differences in disease-free survival (DFS) rates between experimental treatment and standard of care arms [3,[8][9][10][11].Down syndrome and Philadelphia chromosome-positive (Ph+) patients were excluded.Patients with T-ALL will be considered separately in future work.The CONSORT diagram shows the breakdown of study participants in each group and the final analysis population (Fig. 1).

Variable selection methods
Choice of predictor variables is a crucial step when building a clinical prediction model.Investigators typically must reduce a larger set of candidates to a final set of predictor variables used for final model estimation.The methods of predictor selection can be classified into two categories: (1) reduction before modeling and (2) reduction while modeling [12].Method (1) implies that the predictors are selected based on domain expertise prior to studying the relationship between the outcome and candidate predictors in the data to be used for model building.This method of predictor selection is generally preferred, as it best preserves the statistical properties of later model estimation and hypothesis testing [13].Method (2) implies that knowledge of the relationship between the outcome and candidate variables in the data is used to select predictors.Examples of method (2) include univariable screening and stepwise selection (forward, backward, and combined).Though occasionally justifiable, the disadvantages of stepwise selection are well documented and include unstable selection; misleading bias in regression coefficients, standard errors, and p-values; and poorer predictions relative to a full model [12,13].Univariable screening inherits the same disadvantages as forward stepwise selection but tends to have poorer performance due to neglect of marginally "insignificant" variables [12].Therefore, in this work, predictor variables (described in detail below) were selected for inclusion in the model a priori based on clinical expertise.
Transformations for WBC (log(WBC); WBC log ), D8 MRD, and D29 MRD were consistent with Enshaei et al. [5] due to reasonable performance in the PI UKALL model and clinical knowledge regarding their distributions.Transformed MRD is displayed as τ(MRD), corresponding roughly to the negative log transformation [5].The maximum τ(MRD) was 13.82, corresponding to MRD < 1.0 × 10 −5 .Candidate predictor variables are shown in Supplementary Table 1.

External validation of PI UKALL
Steps for external validation followed published guidelines [14].These steps and the level of information required for each step's execution are defined in Supplementary Table 2 and are referred to as Step(1)-Step (6).Note Step (5) and Step (6) are not included due to unavailable information.If in Step(1) the overall calibration slope is found to be less than one, the model is technically considered not to be optimal for the external validation data, though further steps should still be examined as the model may still have practical utility.
We applied the published PI UKALL equation to the COG data [5]:

According to
Step(1), the overall calibration slope for the PI UKALL was calculated.The calibration slope is the estimated log-hazard ratio from a univariable Cox model with the PI UKALL as the predictor.A calibration slope less than 1 in external validation data is indicative of poorer discrimination in the validation data than in the development data, a common occurrence among predictive models reflecting decreased generalizability of the original model and heterogeneity of patient prognosis in derivation vs. validation populations [14].A formal test for the null hypothesis that the overall calibration slope equaled one was conducted.
For Step (2), the primary metric used to compare model discrimination was the concordance index (C-index), defined as the proportion of randomly selected pairs of patients that the model orders concordantly (for a pair to be concordant, the patient with the higher model-predicted probability of relapse has the shorter observed time to relapse) [15].A C-index >0.7 indicates acceptable discriminative capability of a model, while a value of 0.5 indicates that prediction is equivalent to random chance [13].In Step(3), ideally, the original published coefficients would be equal to those obtained if the model were refit in the external validation data.We examined the coefficients for the PI UKALL model re-derived in the full COG analysis population compared to the published coefficients to examine possible true difference in predictor effects between UKALL and COG data.We conducted a hypothesis test for equality of published vs. externally derived model coefficients as detailed in Supplementary Table 2. Kaplan-Meier curves within PI UKALL -defined RGs were reported to satisfy Step(4).

Added value of D8 MRD
D8 MRD is of particular interest to COG, as its collection is unique and standard within the collective.Therefore, to assess incremental added predictive value of D8 MRD to the UKALL model, we fit a multivariable Cox proportional hazards model including τ(D29 MRD), FRG, URG, WBC log and compared this to the model with τ(D8 MRD) included.

Development of PI COG
The development of a new prognostic index for relapse risk, the PI COG , utilized pre-specified covariates based on domain expertise and existing literature from UKALL and COG data [6].The AALL0932/ AALL1131 cohort comprised training data, while AALL0331/AALL0232 patients were used as testing data for temporal (external) validation (Fig. 1).For model development on the training data, τ(D29 MRD), FRG, URG, WBC log , τ(D8 MRD), age at diagnosis (Age), and CNS status were included from an initial class of potential covariates (Supplementary Table 1) due to existing evidence of prognostic relevance and current risk stratification algorithms.
Graphical methods assessed the assumptions of the functional relationships between relapse risk and covariates [13].The proportional hazards assumption was examined using scaled Schoenfeld residual plots by covariate.Plots of the delta-beta residuals helped to visually identify participants with strong influence on hazard ratio estimation.We prespecified a comprehensive set of potential interactions among the continuous variables (Supplementary Table 1) and assessed them for possible model inclusion as a group.PI COG was defined as the linear predictor from the model.Calibration slopes and C-indices were obtained for PI COG overall and within sex and race/ethnicity groups to diagnose potential lack of model fit.
Validation and calibration were assessed using the rms package in R [16].The final model was internally validated using bootstrapping with B = 1000 resamples with optimism-corrected estimates calculated [15].Calibration was examined using smoothed calibration plots [13].Cox model performance was compared to machine learning (ML) alternatives to assess whether relaxed assumptions improved predictive ability.Random forest [17], support vector machine [18], and boosted Cox models were fit to the same predictor variables included in the Cox model (Supplementary Table 3) [19].The benchmarking study included a 5×5-fold nested crossvalidation routine adapted from Fouodo et al. [18].
To compare possible risk stratification approaches, patients were classified according to the current risk classification algorithms used in COG AALL1731 (SR; NCT03914625) and AALL1732 (HR; NCT03959085) trials (Supplementary Table 4, 5).Using the training dataset, cutpoints were calculated dividing the continuous PI COG into four risk-based categories optimizing the model's discriminative ability [20].The censored nature of the data was accounted for by maximizing the Concordance Probability Estimate (CPE), a variation of the C-index [20].Further details of how cutpoints were calculated are included in the Supplementary Methods.Point estimates of 5-year relapse-free survival (RFS) within risk subgroups were obtained using Kaplan-Meier estimation.RFS was defined as time from end of induction (EOI) to relapse or death in remission, or censored at second malignant neoplasm (SMN) or date of last contact for those who remained event-free.Estimates for DFS and OS were also obtained.DFS was defined as time from EOI to relapse, death in remission, or SMN, or censored at last contact.OS was defined as the time from EOI to death or censored at last contact.All analyses were conducted using R Statistical Software® version 4.2.1 (code available from corresponding author upon request) [21].

Study population
Overall, the distributions of clinical characteristics were similar between the training and testing data in both the generating analysis population (Table 1) and between the training and testing data in the post-induction relapse-free survival cohort used for model development and numeric validation (Supplementary Table 6).Among genetic groups, 9629 participants (45.4%) were FRG (52.1% ETV6::RUNX1 fusions and 48.2% DT) and 1256 participants (5.9%) were URG (29.0%KTM2A-rearranged, 28.0% hypodiploid, and 43.4% iAMP21).Ph-like ALL (Supplementary Methods) was present in 996 of 4836 patients tested.D8 MRD and D29 MRD data were available for 76.4% and 84.4% of patients, respectively.Figure 2 shows the distribution of continuous prognostic factors for the combined population.

External validation of PI UKALL
The calibration slope for the original PI UKALL applied to COG data using the published coefficients was 0.79, which was significantly different from one (p < 0.001).The original PI UKALL retained discrimination ability, with a C-index of C = 0.725.When the coefficients for the PI UKALL model were recalculated using external COG data, FRG and WBC have larger hazard ratio estimates and D29 MRD and URG have smaller estimates, indicating possible differing predictor variable effect weighting between the two populations [derived PI UKALL = −0.136*τ(D29MRD)-0.913*FRG+ 0.692*URG + 0.166*WBC log ], and the risk directions for all factors were consistent between the original and derived PI UKALL .For example, FRG is associated with lower relapse risk in both cohorts.These coefficients (log-hazard ratios) associated with the derived PI UKALL yield the following hazard ratios: 0.87 for τ(D29 MRD), 0.40 for FRG, 1.99 for URG, and 1.18 for WBC log .The test for equality of published vs. externally derived model coefficients showed evidence of difference in the coefficients (p < 0.001), indicating that model fit could be improved.Kaplan-Meier curves within PI UKALL -defined RGs are shown in Supplementary Fig. 1 and exhibit good separation between curves (log-rank p < 0.001).
Added value of D8 MRD τ(D8 MRD) was a statistically significant addition to the model, with a modest hazard ratio estimate in the testing data of 0.96 (1 DF Wald p < 0.001) (Supplementary Table 7).The effect size corresponds to an estimated 4% relapse risk reduction for a oneunit increase in τ(D8 MRD) (decrease in D8 MRD), holding D29 MRD, WBC log , FRG, and URG constant.

Development of PI COG
We next developed a model using COG predictors best known for relapse risk using the training dataset (n = 11,102).Tested as a group, the set of potential statistical interactions did not significantly improve model fit (Supplementary Table 1) and were excluded.Table 2 reports the estimated coefficients and hazard ratios from the model containing transformed D8 and D29 MRD, FRG, URG, WBC log , CNS status, and Age.Except for CNS3 (n = 90 in training data, Supplementary Table 6) vs. CNS1, each predictor was strongly associated with relapse risk.Increases in transformed D8 and D29 MRD (i.e., decreases in MRD) were each associated with a decreased relapse risk.Table 2 can also be visualized as an equation as follows: where indicator I(CNS Status) is one if the patient falls into that CNS category, and zero otherwise.This equation can be used to calculate an individual patient's PI COG risk score.Supplementary Fig. 2 provides a visual comparison of the shapes of the distributions of PI UKALL and PI COG .Figure 3 portrays the prognostic index by genetic RG, with higher genetic risk associated with higher PI COG .Diagnostic plots indicate no concerning evidence of nonproportional hazards (Supplementary Fig. 3A) or influential points (Supplementary Fig. 3B).Internal validation indicated very little data-driven overfitting in the modeling process (Supplementary Table 8).Temporal external validation of the new model in the AALL0232/AALL0331 testing data (n = 4100) yielded an overall calibration slope of 0.94, not significantly different from 1 (p = 0.13), indicating overall good calibration in the testing dataset.The model held discrimination as well, with a C-index in the testing data of 0.738.Calibration curves are displayed in Supplementary Fig. 4. In testing data stratified by protocol, we observed a slight underestimation of risk among the few NCI HR patients (AALL0232) with very poor observed risk, likely due to the lack of sufficient data to obtain reliable predictions.Among NCI SR patients (AALL0331), there was an overestimation of risk across the range of the data, with the poorest model estimates again in ranges with fewer observations.The final Cox model was compared to ML alternatives using the same prognostic variables (Supplementary Table 3).Despite enhanced flexibility in the ML models, the discriminative ability of the Cox model was comparable to all ML alternatives.

Comparison of risk stratification for PI COG vs. COG current clinical
The cutpoints maximizing the CPE for PI COG were −1.377, −0.589, and 0.093, resulting in classification of patients' relapse risk into: 38.6% of patients as "low" (RFS 96.8%); 33.1% "standard" (92.6%); 16.8% "intermediate" (84.9%); and 11.5% "high" (66.9%).Figure 4A shows excellent separation and sensible RFS estimates among Kaplan-Meier curves within PI COG -defined RGs.Supplementary Fig. 5 displays the Kaplan-Meier curves within PI COG -defined RGs stratified by testing and training datasets, showing well-separated curves within each dataset.These stratified Kaplan-Meier curves are overlaid for comparison in Supplementary Fig. 6. Figure 4B demonstrates the practical implications of splitting patients' prognostic values by RG, with each patient's PI COG value falling into one of the four risk categories depending on prognostic features.The distribution of the PI COG is similar when stratified by testing and training datasets (Supplementary Fig. 7).
Ninety-seven percent of patients had sufficient data to be retrospectively classified according to current COG AALL1731/ AALL1732 definitions.Shown in Supplementary Table 9, the resulting classification gives 24.5% SR-Favorable (5-year RFS 96.7%), 20.5% SR-Average (93.3%), 12.5% SR-High (82.7%), 3.0% HR-Favorable (96.3%), 29.6% HR (81.8%), and 1.1% Very HR (VHR; 53.6%).Table 3 compares the classification of patients according to both the PI COG and the COG current clinical standard.As seen in the SR-Fav and VHR rows, the two risk classification strategies generally agree when risk is very high or very low.However, for other current COG risk classifications (SR-Avg, SR-High, HR) that collectively include 63% of patients, there is a broader spectrum of PI COG RG assignment.Table 4 displays 5-year RFS estimates within each of the subgroups discussed above.Within the COG SR-Avg group, PI COG identified a "low risk" subgroup with an outstanding 96.0%RFS estimate, similar to the outcomes for patients traditionally classified as SR-Fav.In the COG HR group, we observed a broad range of RFS estimates, from a group with an RFS of 95.5% to a group with an RFS similar to that expected with VHR (66.0%RFS).Similar trends are seen for DFS and OS, as well as when the results are stratified by testing and training datasets (Supplementary Tables 10-19).

DISCUSSION
Prognostic models are used in oncology to aid clinical decision making by adjusting treatment intensity to individual patient relapse risk [22].A prognostic model must satisfy many quality control guidelines to be useful in clinical practice, including appropriate model validation [12,13,15].Ideally, this includes both strong resampling-based internal validation ("training") and external validation in independent populations ("testing") [12,15].
We have developed and rigorously validated a new model to determine a prognostic index (PI COG ) using COG B-ALL trials.PI COG is easily calculated on a large scale and can be hosted online on a web-application for use by patients and practitioners, lending itself well to the described clinical applications (see https://nataliedelrocco.shinyapps.io/COG_PI_Calculator/).This work extends that of Enshaei et al., whose prognostic index, the PI UKALL, was prognostic in the COG data and emphasized the strength of D29 MRD, WBC, and favorable and unfavorable cytogenetics as predictors of outcome in pediatric ALL [5].
These independent analyses were both conducted with large, uniformly annotated clinical trial datasets, giving strong evidence of reliable estimation of the effect of these prognostic factors on relapse risk.This work provided an independent external validation of the PI UKALL, and also demonstrated the contributions of Age, CNS status, and D8 MRD in prognostic modeling for B A Fig. 4 Summaries of the concordance probability estimator (CPE)-defined risk groups of the PI COG .A Kaplan-Meier Curves for Relapse-Free Survival probability within each PI COG -defined risk group for the combined RFS cohorts (n = 15202) and corresponding risk table.B Density plots of the distribution of the PI COG with CPE-defined risk groups indicated by text (Low, Standard, Intermediate, and High) and color for the combined relapse-free survival (RFS) cohort (n = 15202).Risk group defining cutpoints of the PI COG that maximize the CPE are marked by dashed vertical lines. -1 Fig. 3 Boxplots of the distribution of the COG ALL Prognostic Index (PI COG ) risk score by genetic risk group.The central "box" is made up of the 25th percentile, median (50th percentile), and 75th percentile.Lines on either side extend to the minimum and maximum (excluding outliers).Outliers are marked on the plot by points that are higher than the maximum denoted by the upper line.
relapse risk.Despite correlation with D29 MRD and a modest effect, D8 MRD still contributes independently to the model, likely due to ability to indicate excellent expected outcomes when D8 MRD is negative.We additionally note that model estimation showing similar hazard ratios for patients with CNS2 and CNS3 is not unique to this study, and refer the interested reader to Winick et al. for discussion [23].However, present interpretations regarding CNS2 vs. CNS3 must be made with caution as the confidence interval associated with the estimated hazard ratio for CNS3 patients (vs.CNS1) is wide given the relatively small number of these patients.
Difference in performance of PI UKALL in COG patient populations may be attributed to several factors including different geographic case-mix [14], different MRD detection methods, and differing definitions of genetic factors [24].Differences in cytogenetic classification between the COG and UKALL groups include the definition of hyperdiploidy.While the UKALL group defines this favorable cytogenetic subgroup as those with high hyperdiploid (i.e., between 51 and 67 chromosomes), the COG defines this group as those with trisomy of chromosomes 4 and 10.Of note, subsequent UKALL analyses are likely to further refine their definition of this group as an indicator of good risk genetics [24].Additionally, the use of hypodiploidy as an inclusion criterion for HR cytogenetic classification differs between the COG and UKALL groups.The COG considers all individuals with hypodiploidy (<43 chromosomes) as HR.UKALL considers two subsets of hypodiploidy as HR: "near haploidy" (<30 chromosomes) and "low hypodiploidy" (between 30 and 39 chromosomes).Thus, the difference reduces to the small subset of individuals between 40 and 42 chromosomes.TCF3-HLF positivity also contributes to UKALL's HR definition.TCF3-HLF is indeed a very high-risk factor but is exceedingly rare and not routinely assessed in genetic testing algorithms.
We show that PI COG can identify heterogeneity in outcome among the categorically defined RGs used in the current COG risk classification, suggesting that using continuous information may enhance traditional RG designation.This refinement of current RGs could offer further options for therapeutic interventions for certain subsets of patients.As outcomes continue to improve, the burden of treatment-related toxicity becomes an increasingly important consideration [25,26].For those with outstanding prognosis, a less intense chemotherapy regimen may help prevent the life-long complications of therapy, including cardiac disease, secondary cancers, decreased employment, and infertility [27].For example, SR-Avg individuals who are PI COG low risk could be considered for treatment de-intensification.In contrast, for those within the COG HR group with predicted outcome similar to the COG VHR group (e.g., PI COG "high-risk"), innovative therapies could be considered to improve RFS.
Ideally, a fully independent external validation of PI COG should be conducted with close attention to validation in minority demographic populations.Though Supplementary Fig. 8 shows good calibration and discrimination within each race/ethnicity subgroup and both sexes, a true external validation in minority populations is optimal for determining predictive performance.Prospective clinical trials could evaluate the PI COG 's efficacy as a clinical decision aid [28].
In addition to assessing the clinical performance of the PI COG , future research could assess further refinement with critical new prognostic factors.Modern clinical prediction models must be prepared to dynamically incorporate new discoveries and updated information [23].For example, high-throughput sequencing (HTS) for MRD is more sensitive and easily standardized than standard flow cytometry and is a focus of current investigation in childhood ALL [29].Updated models of ALL will also need to adapt to the growing importance of new genetic markers [30], or to improve  This study has several strengths in addition to the size and data consistency of the study cohorts.The availability of D8 MRD, not routinely assessed by other groups, allowed incorporation of early disease response.Extensive prior studies of clinical and genomic variables as outcome predictors enabled this study to have predictor pre-specification instead of model-based selection, enhancing the applicability in external populations.Data-driven selection of PI COG cutpoints to define RGs objectively optimizes outcome-based RG assignment.Several limitations also merit note.Certain patient subgroups (T-ALL, Down syndrome, Ph+) were not used to derive PI COG .The performance of PI COG (or any PI) is unclear in small patient groups with limited data (e.g., Non-Hispanic/Other race or Ph-like).Future studies should assess calibration in such patient subgroups.Finally, because PI COG relies on D29 MRD, only available at end-induction, it cannot be used to modify the first weeks of induction therapy.
In conclusion, contemporary ALL therapy relies on risk stratification but does not use all relevant rich and readily available data.The PI COG showed a wide range of relapse risk within currently used RGs and thus may be useful as a clinical decision aid for future trials.Analogous efforts may have significant clinical value in other cancers.

DATA AVAILABILITY
Children's Oncology Group Data Sharing Statement: The Children's Oncology Group Data Sharing policy describes the release and use of COG individual subject data for use in research projects in accordance with National Clinical Trials Network (NCTN) Program and NCI Community Oncology Research Program (NCORP) Guidelines.Only data expressly released from the oversight of the relevant COG Data and Safety Monitoring Committee (DSMC) are available to be shared.Data sharing will ordinarily be considered only after the primary study manuscript is accepted for publication.For phase 3 studies, individual-level deidentified datasets that would be sufficient to reproduce results provided in a publication containing the primary study analysis can be requested from the NCTN/NCORP Data Archive at https://nctn-data-archive.nci.nih.gov/.Data are available to researchers who wish to analyze the data in secondary studies to enhance the public health benefit of the original work and agree to the terms and conditions of use.For non-phase 3 studies, data are available following the primary publication.An individual-level de-identified dataset containing the variables analyzed in the primary results paper can be expected to be available upon request.Requests for access to COG protocol research data should be sent to: datarequest@childrensoncologygroup.org.Data are available to researchers whose proposed analysis is found by COG to be feasible and of scientific merit and who agree to the terms and conditions of use.For all requests, no other study documents, including the protocol, will be made available and no end date exists for requests.In addition to above, release of data collected in a clinical trial conducted under a binding collaborative agreement between COG or the NCI Cancer Therapy Evaluation Program (CTEP) and a pharmaceutical/biotechnology company must comply with the data sharing terms of the binding collaborative/ contractual agreement and must receive the proper approvals.

Table 1 .
Patient characteristics of the analysis population (n = 21199) a .
a Ph+ and Down Syndrome patients excluded; Abbreviations: MRD, minimal residual disease; Race "Other" includes: Native Hawaiian/other Pacific Islander, American Indian or Alaska Native, and Multiple Races.b Ph-Like testing was not conducted uniformly on all patients, therefore percentages are omitted as they may not indicate a representative proportion.

Table 2 .
Summaries of the PI COG model derived on the training study population.

Table 4 .
5-year relapse-free survival probability estimates (SE) for subgroups by COG retrospective and COG Prognostic Index risk classifications in the combined training/testing data.
Empty cells indicate insufficient sample size for reliable estimation (<25 patients).Patients in SR-Fav/Avg are missing MRD8, as such they are not represented in this table.a Large standard error reflects small sample size (n = 28) and hence broader uncertainty about the RFS estimate.

Table 3 .
[24]le sizes (%) for subgroups by COG risk and COG Prognostic Index classification in the combined training/testing data.use of information from traditional ones[24].A model-derived risk score, such as the PI COG , more readily allows the timely incorporation of such new information (such as HTS MRD and novel genetic subtypes) than do traditional risk stratification algorithms.Traditional risk stratification algorithms combining specific categories of many variables to construct RGs require extensive clinical knowledge regarding relationships between a new marker and other risk stratification variables to determine the appropriate algorithmic use for the new marker.Often, when a new prognostic marker is introduced, the first studies show only an association with outcome, with additional clinical knowledge following over the course of time.In contrast, when data becomes available on the new marker, established statistical methods parallel to those described in this paper can be applied to incorporate the new information into the model.Although model updating is nontrivial, the technology is available and could further strengthen the ability of the PI COG to discriminate outcomes in groups of patients previously categorized together, presenting additional future opportunities to ask targeted questions. the