Bacterial load slopes as biomarkers of tuberculosis therapy success, failure, and relapse

Background: Tuberculosis is expensive to treat, especially since therapy duration is at least six-months, and patients must be followed for up to two years in order to document relapse. There is an urgent need to discover biomarkers that are predictive of long-term treatment outcomes. Currently, tuberculosis programs use two-months sputum conversion for clinical decision making, while phase I clinical trials use extended [14 day] early bactericidal activity [EBA] to triage regimens. Our objective was to develop early treatment stage biomarkers that are predictive of long-term outcomes. Methods and Findings: Data from 1,924 patients in the REMoxTB study was divided into [1] a derivation data-set of 318 patients on six-months standard therapy, [2] two sets of validation datasets comprised of 319 patients on six-months standard therapy, and 1,287 patients randomized to four-months experimental therapy. Sputum time-to-positivity [TTP] data was modeled using a system of ordinary differential equations that identified bacillary kill rates [termed {gamma}-slopes], for fast-replicating bacteria {gamma}-f and for semi-dormant/non-replicating persistent bacteria {gamma}-s, and to estimate time-to-extinction for all bacteria sub-populations in each patient. Time-to-extinction is used to predict the minimum therapy duration required to achieve cure. Using the derivation dataset, machine learning identified the {gamma}-s slope, calculated using first 8 weeks of therapy TTP data, as the highest ranked predictor for treatment outcomes. We then computed {gamma}-s slope thresholds that would reliably predict relapse-free cure for 2, 3, 4, and 6 months therapy duration regimens, and used these to create a diagnostic rule. In the first-validation dataset for six-months therapy duration, the {gamma}-s-derived decision rule demonstrated a sensitivity of 92% and a specificity of 89%; among patients with positive biomarker the relative risk [RR] of failure was 20.40 [95% confidence interval (CI): 7.17-58.08]. In comparison, two-month sputum culture conversion had a sensitivity of 33% and specificity of 71% [RR=1.20 (95% CI: 0.60-2.34)], while for extended-EBA sensitivity was 14% and specificity was 92% [RR=1.71 [95% CI: 0.73-3.48]. In the second validation dataset for four-months therapy duration, the {gamma}-s derived diagnostic rule sensitivity was 81% while specificity was 87% for picking failure versus cure [RR=14.51 (95% CI: 8.33-25.41)]. Conclusions: The ability to predict treatment outcomes during the first eight-weeks of therapy could accelerate evaluation of novel regimens, development of new clinical trial designs, as well as allow personalization of therapy duration in routine treatment programs. Future research applying these diagnostic rules to different clinical trials data are required.


INTRODUCTION
Tuberculosis [TB] is the most important infectious cause of death worldwide, accounting for 3% of all deaths; it killed one billion people over the last two centuries [1]. In both drug-susceptible TB and multidrug-resistant TB (MDR-TB) [2], therapy duration is 6 months, after which patients are followed for up to 18 months to document relapse. The large numbers of patients with TB [10 million/year], the long therapy duration, and the follow up period of up to 2 years, makes TB one of the most expensive diseases to treat.
Thus, it is of crucial importance to identify TB treatment regimens that are equally as effective in drug-resistant TB as in drug-susceptible TB, to identify regimens that can shorten therapy duration, and to identify early biomarkers that obviate the need for 2-year follow up [1][2][3][4][5][6][7][8][9][10][11]. A closely related problem is the time it takes to evaluate and compare such new regimens in phase I-III clinical trials; they take decades to complete given the long follow-up time required to document relapse. Thus, biomarkers that obviate the need for the long follow up to document relapse, and that can be deployed immediately on a global scale at little cost, need to be urgently developed for both routine patient care and to accelerate the time-table of clinical trials.
The tools currently used to monitor TB treatment in the clinic and in clinical trials arose in the historical context of the microbiology technology of 50 years ago. In the late 1970s Jindani and Mitchison performed a 14-day treatment clinical study in East Africa [n=124 patients] that utilized solid agar-based Mycobacterium tuberculosis (Mtb) colony-forming unit [CFU]-derived kill rates defined by linear regression slopes to define early bactericidal activity [EBA], and the 14-day or extended-EBA to capture sterilizing activity, which are the basis of current phase I clinical trials [7,8]. In 1993 Mitchison summarized results of seven clinical studies to propose the use of two-months sputum culture and smear as a surrogate of relapse; the two-month [eight-week] endpoint is now the basis of clinical decision-making in routine clinical care [3,[10][11][12][13]. Eight-week studies are also widely used as phase II studies to select TB regimens that go into the larger phase III studies in which long-term outcomes such as relapse, death, and cure are evaluated. However, the accuracy of these phase I/II studies in predicting hard clinical outcomes such as cure, therapy failure, and relapse, have been challenged [10-12, 14, 15].
In addition, more recent technological advances with semi-automated liquid cultures have demonstrated that the eight-week agar-based cultures may have been over-optimistic and are associated with substantial false-negative rates [16][17][18][19]. On the other hand, time-topositivity [TTP] in the liquid cultures can be used in place of CFUs [20,21]. The liquid culture technology is semi-automated and has been widely deployed across the world for routine clinical care as a diagnostic and for susceptibility testing. Here, we sought to identify mechanistic biomarkers (based on quantitative biology of the disease ) that fulfill the definition of the US Food and Drug Administration BEST (Biomarkers, EndpointS, and other Tools) Resource, for use early during therapy to predict long-term hard clinical endpoints such as cure, therapy failure, and relapse [22,23].
We have developed a mechanistic model to quantitatively explain the drug-regimen bacterial kill kinetics and dynamics of both fast-replicating and semi-dormant/nonreplicating persistent [NRP] Mtb subpopulations in TB patients as reflected in sputum [24]. Here, we used serial sputum TTP-data from patients in the Rapid Evaluation of Moxifloxacin in Tuberculosis [REMoxTB] phase III clinical trial to identify the trajectory of these two bacterial sub-populations and to estimate time in which both Mtb bacteria subpopulations reach extinction (time-to-extinction) [24]. According to Burman, "The ability to prevent relapse is termed sterilizing activity because it is presumed to require killing nearly of all bacilli remaining after the initial phase of therapy" [9].
Restated, failure to reach extinction by the Mtb population in lung lesions is a required condition for therapy failure and relapse. Therefore, the time-to-extinction of all bacillary populations marks the required minimum duration of therapy in order to avoid relapse.

Study design, data extraction and definitions
Our study design is reported in detail in Figure. 1. Briefly, we took data for bacteriologically confirmed TB patients that were enrolled in the REMoxTB clinical study [3]. In which patient sputum was cultured in the Mycobacteria Growth Indicator Tube [MGIT] to confirm bacteria viability. Since our aim was to develop a method agnostic of regimens used and drug-resistance status, patient data from the study [3] was used in our analyses regardless of drug-resistance status. Patients with majority of sputum samples that were contaminated or missing were excluded.
Patient and microbial details, including therapy regimens and serial TTPs, were extracted from the CPTR website [http://www.cptrinitiative.org]. Time-to-extinction was defined as achieving a bacterial burden ≤ 10 -2 colonies/mL, as mathematically justified in our prior work [24]. Microbiologic cure was defined as two negative sputum cultures without Step 1: Patients without sufficient data points to derive bacterial kill s were removed.
Step 2: The weekly sputum time-to-positivity data was then converted to colony forming units and modeled using ordinary differential equations.
Step 3: Data partitioning of 50% of patients in stanadrd of care six months therapy as derivation data-set and the other 50% into valdiation dataset. All patients in experimental arm, administered over 4 months were assigned to validation datasets.
Step 4: Four mathematical modeling and machin learning types of analysis in derivation dataset to [1] identify predictors of time-to-extinction [TTE] and [2] thresh values deliniating different TTE, and [3] design a diagnostic rule for different therapy durations.
Step 5: Accuracy diganostic rule/biomarker for six-months therapy duration in standard of care validation dataset using clinical definitions of outcome [relapse, cure].
Step 6: Accuracy of diganostic rule/biomarker for four-months therapy dura in two experimental arms in validation dataset using clinical definitions of relapse and cure. l slopes and the -, hine shold y of uration in patients deemed cured at the end of therapy. Relapses were confirmed by 24-locus mycobacterial-interspersed-repetitive-unit analysis [24]. Failure to attain microbiologic cure at the end of therapy defined therapy failure, as per REMoxTB study protocol [24].

Data partitioning
Patients on the standard TB therapy regimen were randomly partitioned into two subsets of equal size. The first set was designated as the model derivation set, while the remainder was assigned for use in model validation [validation data set]. To capture sufficient relapse events, only patients with at least two consecutive sputum samples during follow-up after treatment were used in model training and cross validation.
Patients who received the experimental REMoxTB arms were used only in the validation dataset for sensitivity and specificity of predictors with 4 months therapy duration.

Mathematical modeling for converting TTPs to CFUs
In order to convert TTPs to CFU/mL, we applied the formula: where α is 8.  [25]. While our formula is not a linear regression equation, we still wanted to find out if it was accurate at the start of therapy as at 56 days, in patients. Therefore, we applied formula/equation #1 to an independent clinical data set of patients on TB therapy, the vitamin A study in which we had weekly TTPs and CFUs in 56 patients as part of our morphism mapping between the hollow fiber system and patients on standard therapy [18,24]. Results are shown in Figure S1, which shows that our formula remained accurate at 56 days as on day 0. Therefore, we employed equation #1 for toggling between CFU/mL and TTP.

Mathematical Model
Our mathematical model, described in detail in the past [24] The parameters r f and r s also measure of the reproductive or growth fitness, a measure of their virulence. The fast replication (log phase growth) Mtb grow at rate r f while the slow at rate r s . It has been shown that in TB patients, these bacteria subpopulations co-exist, . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 5, 2020. . however, in active TB disease, the population of bacteria in log-phase is dominant [26, 27, 29- . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 5, 2020. . MCMC convergence was assessed visually and by using the chain convergence diagnostic tools in the R coda package.

Identification of biomarkers predicting outcomes in derivation dataset
Identification of biomarkers that best predicted therapy outcomes was carried out using classification and regression criteria [CART] of Breiman et al [36]. Using the derivation dataset, we examined all demographic, clinical, and radiological factors, as well as model-derived γ s and γ f slopes and the initial bacterial burdens [B(0)], as potential predictors of outcome. Outcome was defined as either therapy success at end of therapy, or therapy failure (failure at the end of treatment or relapse), or relapse. The steps we followed were implemented by two independent investigators in R (Rpart) and Salford software, and have been described in detail in the past [37] .
First, CART analysis was used to identify and rank the top predictors of therapy failure and relapse. Second, we used clustering to characterize the relationship of the top predictors for each specific treatment outcome, and also identified the statistical association [38]. TTP trajectories were clustered using the K-means algorithm implemented in the KML-package in R [38]. The 6-month TTP data for each cluster was reduced to derive (i) the 4-month slopes [using the first 4 months accrued data] and (ii) then 2-month slopes [based on the first eight-week accrued data]. The model was fitted to data for each separate cluster and their respective reduced subsets.
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 5, 2020. . https://doi.org/10.1101/2020.05.03.20086579 doi: medRxiv preprint Finally, we utilized Markov chain Monte Carlo simulations of time-to-extinction in tandem with CART to identify slope thresholds and initial bacterial burden that best classified relapses and therapy failure [35].

Mathematical simulations for indeterminate data zones
We computed 10,000 bacteria trajectories to simulate different treatment outcomes. The initial bacterial burdens based on the range in derivation data set of between 3-7 log 10 CFU/mL and γ -slopes between 0.05 to 0.5 log 10 CFUs/day, were varied simultaneously, with the rest of all model parameters held constant. TTE for each separate trajectory was computed. The TTE values define the transcritical bifurcation points that explains when the Mtb NRP stable state switches to extinction. Regions of time within which bacteria subpopulations would go extinct were constructed and partitioned to reflect the expected clinical treatment duration intervals.

Sensitivity analysis for treatment duration
Monte-Carlo experiments were carried out to identify changes in γ s values that resulted in treatment duration shortening (2 and 4 months) and those that led to prolonged treatment duration (7, 8 and 9 months). Magnitudes that correspond to these treatment end-points were determined relative to different categories of patient initial bacterial load, (i) high (>5·0 log 10 CFU/mL), (ii) medium (3·5-5·0 log 10 CFU/mL) and (iii) low (<3·5 log 10 CFU/mL). These bounds were selected to toggle between CART discrete bounds and sweep across continuous patient CFU burdens to examine effect of different slope magnitudes on outcome for the defined therapy durations.
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 5, 2020. .

Validation of identified biomarkers
Individual patient TTP trajectories were fitted to the model to identify the corresponding γ s and γ f in the validation datasets. The accuracy, sensitivity, and specificity of biomarkers derived in the derivation dataset were calculated using the validation dataset for cure, relapse, or therapy failure, for 6 and 4-months duration of therapy. The definitions for cure, relapse, and therapy failure used were those defined by the REMoxTB clinical trial protocol [3]. We used the standard statistical and clinical definitions for sensitivity, specificity, accuracy, and the number needed to diagnose failure and relapse [39,40].

Statistical analysis
Mean values between groups were compared using Student's t-test or analysis of variance (ANOVA) F-test, while the Mann-Whitney test was used for proportions and compare medians from distributions of the fast and slow slopes derived at 2-months, 4-months and 6-months accrued TTP data. Spearman's correlations were used to examine correlation while un-weighted Cohen' Kappa coefficients examined agreements of clinical outcomes derived from REMoxTB study definition versus those derived from the model based on time-to-extinction. All analyses were performed with packages in R.  CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

Clinical and laboratory characteristics in derivation and validation datasets
The copyright holder for this preprint this version posted May 5, 2020. .
[ Figure 1]. This was followed by converting the 1,924 patients TTP-series to CFU/mL using equation 1, before modeling the data with a set of ODEs 2 and 3, to describe trajectories of Mtb CFU/mL with time [i.e., slopes]. We identified ODE-model parameter estimates using 8-week [2-months]-, 4-months-, and 6-months accrued TTP-derived data for all 1,924 patients. The model parameter estimates are shown in Table S1. We termed the Mtb kill rates γ -slopes, where

Data partitioning into derivation and validation datasets
We separated the 1,924 patients' data into derivation and validation datasets, shown in   . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 5, 2020. . is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 5, 2020. therapy, as shown in Figure 1. All patients in the derivation dataset were randomized to six-months therapy duration. The validation datasets comprised of (i) 319 patients on standard therapy for six-months duration, and (ii) 1,287 patients randomized to the experimental arms [isoniazid or ethambutol] that had a four-months therapy duration. Table 1 shows that the demographic and clinical characteristics were similar between the derivation data set and all validation data sets, which means that the data-partitioning step was executed successfully.

Time-to-extinction versus clinical trial-based outcome definitions
We then used the derivation dataset to determine if the time-to-extinction of the total Mtb . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 5, 2020. .

Predictors of outcome in derivation dataset
Classification and regression trees [CART] were used, to identify predictors of target outcome, defined as sputum microbial outcomes [cure at end of therapy, therapy failure or relapse], using potential predictors that included ALL the clinical and laboratory features, including ODE-model derived γ-slopes, for the tasks of classification and regression as input/independent variable. CART identified the γ s [semi-dormant/NRP kill] slope as the primary predictor [which had a variable importance score of 100%], followed by initial bacterial burden just prior to therapy commencement [which we termed B (0)], which had a variable importance score of 91.7%. This means that the initial TTP [B(0)] improved the primary predictor by an extra 91.7%. Notably, γ f was not ranked as a predictor using this agnostic machine learning method. CART performs its own cross-validation within the derivation dataset, in this case by randomly splitting the derivation dataset five times. With the cross-validation, the post-test validation area under the curve [AUC] in the same derivation set was >85%, demonstrating that γ s plus initial TTP [B(0)] will likely perform as good predictors in future datasets.

Clustering-based approaches to identify biomarkers in derivation dataset
Clustering identified four distinct outcome groups based on individual trajectories versus time-to-extinction analysis in the 238 patients in the derivation dataset, as shown in is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 5, 2020. . . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 5, 2020. . https://doi.org/10.1101/2020.05.03.20086579 doi: medRxiv preprint microbiologic cure at the end of six months therapy [failed therapy at the end of six months] but achieved relapse-free cure when standard therapy was continued beyond six months duration. These four clusters represented 238/318 [74.84%] of patients with less than 2 or more missing observations during follow up. The model explained these data well, as is shown in online Figure S2, Figure S3, and Table S1, while the corresponding summary statistics for each cluster are shown in Table S2.
We used this clustering step to identify the minimum duration of data gathering that would give a γ-slope that could accurately predict cure or therapy failure or relapse.  Table S1] versus outcomes were examined in pairwise comparisons using the Mann-Whitney-Wilcoxon test. Figure 4 shows that the γ f values did not discriminate failures from cures, consistent with CART findings. However, γ s =0.15 or <0.1 log 10 CFU/mL/day [modeling semi-dormant/NRP Mtb] were better at discriminating these outcomes. The slopes derived with 8-week-vs-4 months data differed in the misclassification of patients' outcomes, the former misclassifying more relapses as cures and the latter misclassifying more cures as relapses. Nevertheless, as demonstrated by the statistical comparisons in Figure 4I, the 8-week derived TTP data γ s adequately diagnosed relapse versus other outcomes. In other words, γ s calculated using eight-weekderived TTP data is a good predictor of sterilizing effect up to 18-months after therapy cessation, and this eight-week data-derived slope thus measures sterilizing activity rate.
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 5, 2020. . . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 5, 2020. . Subsequently, all γ s discussed herein were those identified using the first eight-weeksderived data. γ s between 0.09 and 0.14 had a >65% chance of failing treatment. Figure S4 also shows that in order to achieve cure/bacillary population extinction within 2 months of treatment, . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 5, 2020.

Creation of γ s -based rule to predict relapse for different therapy durations from derivation dataset
In the final derivation step, we established a diagnostic rule for the relationship between γ s -slopes and the outcomes, using Latin hypercube sampling for sensitivity analyses, with results shown in Figure 5. Figure 5A-D shows that increasing or reducing the γ s . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 5, 2020.    . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 5, 2020. .  Table S4]. combined with the initial TTP at treatment commencement had a sensitivity of 92% and specificity of 86% in identifying failure from relapse-free cure, the RR of failure when . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 5, 2020. .  .

CC-BY-NC-ND 4.0 International license
It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 5, 2020.  Table 2 and Table S4], while NND was 1.29. Failures either arise as therapy failure or relapse; Table 2 shows the sensitivities for these different biomarkers in predicting relapses from treatment failures.
The slope decision rule based on γ s >0.15 has a sensitivity of 92% and a specificity of 89% in predicting relapses from failures. Thus, the biomarkers we derived were highly specific at identifying relapse-free cure, therapy failure, and relapse.

Performance of γ s -based rules in forecasting 4-months therapy duration outcomes
Next, we tested the accuracy of the diagnostic rule for four-months therapy duration in  . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 5, 2020. .
In the arm in which ethambutol was replaced by moxifloxacin (n=633), 533 patients had enough data to calculate 8-week slopes. In this dataset 385 (72.23%) of patients achieved cure, 46 (8.63%) had therapy failure, while 102 relapse (19.4%). The sensitivity of the extended EBA was only 10%, and the NND was 18.73. The sensitivity of γ s -based slopes was 70% and the specificity 71% for cure versus therapy failure, while the sensitivity was 70% and specificity 65% for picking relapse versus therapy failure. The NND was 1.89.
In order to summate, we calculated an overall value of the relative risk of failure when our B(0) and γ s -based slope predicted poor outcome for a specified duration of therapy

DISCUSSION
First, we found that the γ s [slow replicating] slope is a good surrogate of sterilizing activity, based on ability to predict relapse. Conversely, the extended EBA had a sensitivity of 14% for predicting outcomes at 6 months and beyond, and a poor accuracy.
The extended EBA is effectively two-weeks accrued data; the poor sensitivity means that the total time for which the bacterial kill data is collected is too short to accurately capture sterilizing activity slopes. Indeed, the poor sensitivity of γ f -slope-based metric . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 5, 2020. . means that most regimens with good sterilizing effect could be thrown away [too many false negatives for sterilizing activity] in regimen selection for sterilizing activity.
Similarly, the 2-month sputum conversion had a sensitivity of 33% and specificity of 71%. These commonly used clinical indices gave us an opportunity to externally validate our modeling approach. In this case, the last major meta-analyses on 2-month cultures as a predictor of long-term outcome in TB performed by Horne et al in 2010 identified a sensitivity of 40% [95% CI, 25-56%] and specificity of 85% [95% CI, 77%-91%], which was confirmed in subsequent studies [14,41,42]. Thus, our modeling findings are consistent with results of these major meta-analyses. This means that our 8-weeksderived γ s slope plus initial bacterial burden, which had a sensitivity of 92% and specificity of 86% for 6 months therapy duration regimens, would perform better than the 2-month sputum conversion. In addition, our γ s slope can predict outcomes at shorter therapy durations than 6 months such as 4-months duration; the relative risk of therapy failure among patients with positive biomarker for specified therapy duration was >8.0 Thus the γ s -slope based on the first 8-weeks TTP data is a good response biomarker for sterilizing activity, even for therapy duration less than standard short course chemotherapy.
The γ s -slope, which we will henceforth term the "sterilizing activity rate", fulfills the BEST criteria and definition of a monitoring biomarker in the category of a pharmacodynamic/response biomarker, in a similar fashion to HIV and hepatitis C viral load biomarker, and could play the same role in TB therapeutics and clinical trials [23, [43][44][45]. According to BEST criteria, a pharmacodynamic/response biomarker provides . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 5, 2020. . . For TB, we propose identification and ranking of regimens using preclinical models that can accurately translate the sterilizing activity rate to patients [24,46]. The regimens so derived, including optimal doses, and the translated sterilizing activity rate will provide good priors for the design of 8-week clinical trials for novel regimens versus standard therapy, with weekly TTP as the main output and drug pharmacokinetics as a secondary outcome.
The sterilizing effect rate [γ s -slope], initial TTP, and trajectories can then be used to estimate therapy duration for the novel regimens and determine if indeed the new regimens can shorten TB treatment prior to performance of phase III studies. The 8weeks TTP-data derived slopes can be used to compute a lower and more accurate patient sample sizes required to power the phase III trials, given the good accuracy in forecasting relapse. As an example, the number needed to diagnose [NND] failure and relapse of <2, when compared to ~20 for extended EBA and 5-6 for 2-months therapy, gives a more straightforward insight into the relative number of patients tested in each arm by different biomarkers. Moreover, since the predictive value of the sterilizing activity rates on relapse or cure or therapy failure is independent of the regimen the slopes can be used in clinical trials of MDR-TB and for "pan-susceptible" TB regimens, indeed for any TB regimen.
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 5, 2020. .
As regards to clinical practice, our findings add to the recent discovery that initial Mtb burden can be used to determine patients who can benefit from 4-month duration therapy [47]. Here, we found that the sterilizing activity rate was ranked higher than initial bacterial burden. To put this is context, the risk of development of AIDS and death in patients whose HIV viral load did not reach undetectable within first 12 months was 2.40-fold compared to those who had, and a <75% reduction in viral load had a RR of 2.27-fold for poor outcomes Thus, our findings could also be used to individualize therapy, in place of two-month smears/cultures currently recommended in routine care in TB programs worldwide. First, if these patients with potentially higher rates of therapy failure and relapse were identified during the first eight weeks of therapy, then interventions such as dose increases or switching therapy regimens could be made [37]. Second, the sterilizing effect rate [γ s slopes] could also be used by TB programs to identify patients who could be cured with specific shorter therapy durations of either 2, 3 or 4 months, on any regimen.
Alternatively, they could be used to identify how long therapy duration should be extended beyond 6 months, thereby individualizing therapy duration, in patients with sputum γ s slopes that predict the slow cure clusters. Since many TB programs across the world already employ liquid culture systems that generate TTP, it means that the biomarker we propose would come at no extra cost to those TB programs. Computation of the slope could easily be implemented on a computer [or on a phone with specifically designed app].
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 5, 2020. .
Our study has some limitations. First, it could be argued that our findings are specific to the dataset we analyzed. However, the machine-learning cross-validation procedures we used are scored on how well predictors will perform on an entirely independent dataset in the future. Nevertheless, the accuracy of the biomarkers will still need to be further confirmed in other large datasets in a range of clinical contexts and with different regimens. Further, this approach can be adapted for other non-tuberculosis bacterial infections. Second, calculation of slopes is relatively complex. However, software can easily be written to automate this, as we have attempted elsewhere. Finally, not all patients who do not reach bacterial population extinction will fail therapy or relapse. This means that our approach may lead to over treating of these patients who would otherwise be cured. Examination of our proposed biomarkers with other tests such as radiological findings and therapeutic drug monitoring could reduce the number of over treated patients and are subject to ongoing analyses. However, even with these limitations, the early TTP-based biomarkers that we identified as predicting long-term clinical outcomes such as relapse for different therapy durations, have sensitivities and specificities that are higher than currently employed methods.

AUTHOR CONTRIBUTORS
GM performed the mathematical modeling, interpreted the data and wrote the first draft of the manuscript; JGP performed sample size calculations, CART and statistical analyses, and designed the sample size calculator; TG oversaw the conduct of the study, led the data interpretation, application of criteria for biomarkers, and the clinical meaning. All three authors wrote the manuscript, revised it, and all approved the final submitted version. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 5, 2020. . . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 5, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 5, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 5, 2020.     is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 5, 2020. .  is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 5, 2020.    . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 5, 2020. .  . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 5, 2020. . https://doi.org/10.1101/2020.05.03.20086579 doi: medRxiv preprint respectively. The gold dots represent the observations, the solid lines are the model predictions and the shaded regions represent the 95% credible intervals.
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 5, 2020. . . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 5, 2020. . -slope that is less than 0.1.
In D), the 4-month failure threshold is predicted to be 0.1(0.097), while 0.14 predicts cures. Both in C and D, the zone between rates (Figure S4).
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 5, 2020.

2M Biomarker relapse (F) vs cur e (S) cut-off points
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 5, 2020. .