Introduction

Clinical trials are carried out on the human body with the aim of solving certain medical problems. Such particularity of clinical trials determines that they should not only be rigorous but also abide by ethics1. Clinical trials are often conducted and funded by the pharmaceutical company to justify whether a new drug is effective, and there are four phases in the clinical trials of producing new drugs. Phase I trial is the foundation of phase II, III and IV clinical trials, and it normally includes 20–30 people, aiming to identify the MTD, which is the most toxic dose referred to patients, so that if the dose higher than the MTD, patients can not tolerate and may cause the fatal death and if the dose lower than the MTD, patients would suffer ineffective dosage and may miss the best time for treatment. So the therapeutic effect of the drug can be maximized when the patients are treated at the MTD, and then the MTD or dose level less than the MTD is recommended to phase II. Finding the MTD accurately and efficiently is very crucial in phase I which is also called the dose finding trial. The MTD is defined as the dose whose toxicity probability is closest to a target toxicity probability, usually say 33%2. In the process of dose-finding trial, the dose assigned to the next patient is corresponding to responses of former patients3. Many methods are applied to determine the MTD, including algorithm-based, model-assisted, and model-based designs.

Algorithm-based designs include the 3+3 design4 and its extensions, the accelerated titration design5, and the biased coin design6. Model-assisted designs include the bayesian optimal interval design7, the modified toxicity probability interval (mTPI) design8, and the keyboard design9. Model-based designs require some statistical background to model the relationship between dose and toxicity probability, including the CRM10, the dose escalation with overdose control11, and the Bayesian logistic regression model12. The key point of CRM method is to assume a parametric function between dose and toxicity probability, and based on the dose-toxicity curve, we can estimate the true toxicity probability by updating the observed data from each treated patient. The dose with the updated toxicity probability closest to the target rate will be assigned to the next cohort of patients.

In the literature, there are many works on modifications of CRM models. The issue of overdosing is considered by Faries13, Goodman et al.14, and Ishizuka and Ohashi15. Piantadosi et al.16 propose a modified CRM for cytotoxic drugs to fit the limited sample size of the phase I trials. Lee and Cheung17 and Cheung18 investigate the model calibration for the CRM with single skeleton. Yin and Yuan 19 propose the Bayesian model averaging continual reassessment method (BMA-CRM) and suggest to use multiple skeletons when conducting the CRM. A natural problem followed by the present of BMA-CRM is that how to specify multiple skeletons. To overcome this problem, Pan and Yuan20 define the equivalence of multiple skeletons and propose a method to choose skeletons for BMA-CRM models.

To incorporate grade information, Yuan et al.21 further modify the CRM by the quasi-Bernoulli likelihood, and Chia-Wei Hsu et al22 provide a R package named UnifiedDoseFinding for non-binary outcomes dose-finding trials. Since practical resources are scarce to conduct the CRM, Wheeler et al.23 present some comprehensive recommendations and guidelines to support clinicians to implement and report dose-finding trials using the CRM. Wages and Petroni24 develop a web tool for designing and conducting the CRM. Recently, North et al.25 extend the CRM and present a novel two-stage dose assignment procedure which combines the rule-based design with the CRM. Finally, Liu et al.26 develop a bridging CRM to find the MTD in different ethnic populations.

Substantial efforts have been made on the modification of the CRM, however, only few of these modified CRMs consider the late-onset toxicity . When the phase I trials are conducted, we have to specify a certain observation window (normally 28 days) for dose-limiting toxicity (DLT) assessment. In general, the basic method is that we need to classify the toxicity after treatment whether is immediately observed in the window or delayed out of the window \(\tau\). If the observation is censored at time \(\mu _i\), we still calculate the conditional probability of the occurrence toxicity within window, i.e., \(Pr(t_i<\tau |t_i>\mu _i)\) 27. In the paper, we aim to develop a new CRM to deal with the treated patient if experiences DLT in or out of the observation window simultaneously. This goal motivates us to develop a model which can incorporate both the basic binary and the delayed responses. Ideally, we can only observe once in the observation window and no delayed toxicity out of the window. But in fact, the toxicity may be delayed and we ignore delayed responses in the trial using classic CRMs. Our idea is that the model can deal with immediately toxic or late onset toxic response since our model can include all the information of treatment effects. To save time and human cost, we only observe once in the evaluation window, nevertheless we cannot observe delayed response, our model still works well. We develop the Cox model to modify the classic CRMs. The information of delayed response and the initial guess of toxicity probability can be included in the Cox model, which is convenient for the future study of mixture of immediate and non-immediate responses. As we know, the Cox model is commonly used in survival analysis, and its technique may be an important approach in clinical research if we collect the real data28. The transformation of Cox model in clinical trial is used to analyze the patients’ responses if censored or not , what is more, we can not distinguish the delayed response with the normal ones. Simulations by R program show that the proposed new model has desirable operating characteristics and enhances the performance of the CRM.

The rest of this paper is organized as follows, in “Methods” section, we first derive the new toxicity probability function by the Cox model, and then provide the detailed dose-finding algorithm for our new model. In “Simulation studies” section, we carry out a detailed R simulations to compare and evaluate the performance of our new model with the classic CRM models with three pre-specified dose-toxicity curves. Finally, we end with conclusions and a brief discussion.

Method

Modified CRM model based on the Cox model

Let \(( p_1,\dots , p_J )\) be the skeleton of the J doses, \(p_1< \dots < p_J\), and let \(\pi (t, p_i)\) denote the probability that patients experience DLT within a period t. We define the “toxic response” in the dose finding trial as the event in survival analysis. Further, we can use the Cox model to derive the explicit formula for \(\pi (t, p_i)\). Hence, the Cox model introduces a new approach of the CRM model, and the explicit formula for the hazard rate function \(h(t, p_i)\) is as follows,

$$\begin{aligned} h(t,p_i)=h_{0}(t) \exp \left( \alpha p_i\right) =\frac{f(t, p_i)}{S(t, p_i)} \quad \end{aligned}$$
(1)

where \(\alpha\) is the unknown parameter, \(f(t, p_{i})=\pi ^{\prime }(t, p_i)\), and \(S(t, p_i)=P(T> t | p_i)\). Suppose all the patients only been observed within t, and we regard these patients having a toxic response after t as no response. From equation (1), we can derive \(S(t)=e^{-\int _{0}^{t} h(u) d u}\), and obtain the explicit expression for \(\pi ( t,p_{i})\) as:

$$\begin{aligned} \pi (t,p_{i})= 1-e^{-\int _{0}^{t} h_{0}(u) \exp \left( \alpha p_{i}\right) d u} \quad \end{aligned}$$
(2)

Further, let \(\int _{0}^{ t} h_{0}(u) du\)= C, then the probability of a patient can be observed a toxic response within t at dose i is:

$$\begin{aligned} \pi (i)= 1-e^{-exp(\alpha p_{i})C} \quad \end{aligned}$$
(3)

where C is a constant, and \(\alpha\) is assigned to have the prior distribution \(f(\alpha )\).

Let \(D=\left\{ \left( n_{i}, y_{i}\right) , i=1, \ldots , J\right\}\) be the observed data, where \(n_i\) is the number of patients who have been assigned to dose i, and \(y_i\) is the number of DLTs observed at dose i. Then the likelihood function is:

$$\begin{aligned} L(\alpha \mid D)=\prod _{i=1}^{J}\left\{ 1-e^{-\exp \left( \alpha p_{i}\right) C}\right\} ^{y_{i}}\left\{ 1-\left[ 1-e^{-\exp \left( \alpha p_{i}\right) C}\right] \right\} ^{n_{i} - y_{i}} \end{aligned}$$
(4)

Now the posterior mean toxicity probability at dose i is:

$$\begin{aligned} {\hat{\pi }}_{i}=\int \left( 1-e^{-\exp \left( \alpha p_{i}\right) C}\right) \frac{L(D \mid \alpha ) f(\alpha )}{\int L(D \mid \alpha ) f(\alpha ) \textrm{d} \alpha } \textrm{d} \alpha , \quad i=1, \cdots , J \end{aligned}$$
(5)

Similar to the classic CRM designs, we update the posterior mean toxicity probability through the successively entering of patients, which guides the dose transition in the trial. The procedure will not be terminated until the last cohort of the patients. The dose level with final update of the posterior mean toxicity probability closest to the target should be chosen as the MTD.

Dose-finding algorithm

To conduct the modified CRM model, we take the following steps:

  • Step 1: Choose a prior \(f(\alpha )\) distribution for \(\alpha\).

  • Step 2: The first cohort of patients is assigned to the lowest dose, and \(D=\left\{ \left( n_{i}, y_{i}\right) , i=1, \ldots , J\right\}\).

    $$\begin{aligned} Likelihood \quad function: L(D \mid \alpha )=\prod _{i=1}^{J}\left\{ 1-e^{-\exp \left( \alpha p_{i}\right) C}\right\} ^{y_{i}}\left\{ e^{-\exp \left( \alpha p_{i}\right) C}\right\} ^{n_{i}- y_{i}} \end{aligned}$$
    (6)
  • Step 3: Use previous information and the Bayes’ formula, we can estimate the updated toxicity probability \({\hat{\pi }}_{i}\) of each dose.

    $$\begin{aligned} {\hat{\pi }}_{i}=\int (1-e^{-\exp \left( \alpha p_{i}\right) C}) \frac{L(D \mid \alpha ) f(\alpha )}{\int L(D \mid \alpha ) f(\alpha ) \textrm{d} \alpha } \textrm{d} \alpha , \quad i=1, \cdots , J \end{aligned}$$
    (7)
  • Step 4: The selection of dose for next cohort of patients depends on the dose \(i^{*}\), where \(i^{*}=\underset{i \in (1, \cdots , J)}{{\text {argmin}}}\left| {\hat{\pi }}_{i}-p_{\textrm{T}}\right|\), where \(p_{\textrm{T}}\) is the physician-specified toxicity target.

    • If \(d_i^{*}\) \(>d_i\), we escalate the dose level from i to \(i + 1\).

    • If \(d_i^{*}\) \(<d_i\), we deescalate the dose level from i to \(i - 1\).

    • If \(d_i^{*}= d_i\), we stay at the current dose level i.

  • Step 5: Repeat steps 1-4 until all the patients enter the trial. The dose with the posterior mean toxicity probability closest to the target is defined as the MTD.

Simulation studies

Simulation setting

In the simulation, the prior distribution is taken as a normal distribution with mean 0 and deviation \(\sigma =2\) 19. We simulate each trial for 2000 times by R to get more accurate results. We also compared the performance of the new model with the power model \(\pi (p_{i}) = p_{i}^{\exp (\alpha )}\), logistic model \(\pi (p_{i}) =\frac{\exp (-2+\alpha p_{i})}{1+\exp (-2+\alpha p_{i})}\) and hyperbolic tangent model \(\pi (p_{i}) = [(\tanh (p_{i})+1) / 2]^{\alpha } = \left[ \frac{\exp (p_{i})}{\exp (p_{i})+\exp (-p_{i})}\right] ^{\alpha }\) used in the classic CRM designs under EARS criteria proposed by Zhang et al. 29.

We simulate six different toxicity scenarios with eight different dose levels in each trial, these scenarios are shown in Table 1. The target toxicity probability is 30%. The true toxicity probability in each scenario is assumed to be monotonically increasing with the dose level. In each scenario, we list the true toxicity probability in the first row, and the corresponding skeleton in the second row, which is generated from the getprior() function in R. The MTDs of the six different scenarios are marked in bolditalic. These different true toxicity probabilities present some different situations in practice. To be more specific, in scenario 1, the toxicity probability of each dose locates on a relatively low level, only the last dose reaches to 30%. In scenarios 2, 3, 4, the toxicity probabilities increase slightly at the lower doses, however, they increase sharply when at higher dose levels. In scenarios 5 and 6, the toxicity probabilities are relatively high at lower dose levels, for instance, toxicity probability is 0.3 at dose 3 in scenario 5, and 0.2 at dose 1 in scenario 6. In these two cases, over half of the doses are overly toxic, i.e., the patient treated with the dose whose toxicity probability is higher than 30%.

We confirm that all methods were performed in accordance with the relevant guidelines and regulations.

Table 1 Six Scenarios with different MTDs.
Table 2 Probability of each dose being chosen as the final MTD according to different models.

Results evaluation

Final selected MTD

We study the classic CRMs and our model to compare their percentages of the final MTD selection. The higher the percentage, the more patients assigned to the dose. Table 2 shows the percentage of the final chosen MTD in six scenarios. Each row of the table is the percentages of patients assigned to each dose during the trial. For example, in scenario 3, the MTD is at dose 6. The fourth row indicates that 0.7%, 74%, 24.85% and 0.45% of the patients are assigned to the dose 5, 6, 7 and 8, respectively. That is our model allocates most of the patients to the MTD with dose 6.  The simulation results of our new model at MTDs are shown in bold.

In scenario 1, the MTD is located at the highest dose level, i.e., dose 8, and all the four CRM models perform well. All the four models select the right MTD with a percentage higher than 87%. The power model has the best result in this scenario. It is true that the new model is the second-best in this scenario with 99.55%, however, this number is almost as good as the best result 100%. In scenario 2, dose 7 is the MTD, and the logistic model has the best result. Again, the new model is the second best with only 1.75% lower than the best result of 64.55%. In particular, in scenario 3 our new model becomes the best compared with the other three models. It has a percentage of 74% to choose the right MTD. Nevertheless, this number is significantly larger than the worst-case with 51.3%. Similarly, in scenarios 4, 5, and 6, the new model selects the right MTD with the second-best results, just a little bit lower than the best results. In scenarios 4 and 5, the accuracy of our new model is higher than 72%, indicating that our new model is competitively reasonable. Particularly, in scenario 2, 30.85% and 36.80% of patients choose the dose 8 as the MTD in logistic and our models, respectively. In all other scenarios have very similar behaviors. This does not necessarily lead to over toxic dose selection of the MTD at the higher doses, because the logistic model tends to overestimate the toxicity probabilities due to ignoring or discarding the missing data of delayed responses, which results in dose escalation tends to be less aggressive. On the other hand, our design underestimate the dose toxicity during the observation period and thus dose escalation tends to be more aggressive so that the value 36.8% is higher than that in the logistic model, but our design consider the late-onset data which are in fact available before we determine the MTD. As a result, dose escalation in our design is relatively conservative.

Figure 1 show the initial dose-toxicity curves of the new model and three classical CRM models using six different skeletons respectively. The parameter \(\alpha\) in each model is set as 0.5. In general, the more the shape of initial curve is similar to the true dose-toxicity curve, the more correctly identify the MTD.

Figure 1
figure 1

Toxicity probability using 6 different skeletons.

EARS

Table 3 shows the results of EARS in our simulation.

  1. 1.

    E reports for Efficiency, and E1 shows the percentage of simulations in which patients are treated with any dose lower than the MTD. E2 shows the mean and standard deviation of the percentage of patients treated with any dose lower than the MTD. In fact, these assignments lower than the MTD are ineffective. To design a good model, we expect the results as small as possible.

    Clearly, the proposed model is the second-best in all scenarios by the E criterion from the table. Therefore, we can conclude that this new model is efficient and performs well.

  2. 2.

    A represents for Accuracy, and A1 reports the percentage of simulations in which patients are assigned to the right MTD. A2 states the mean and standard deviation of the percentage of patients assigned to the MTD. A3 represents the percentage of simulations in which over half of patients are assigned to the right MTD. To design a good model, we expect the results of A1, A2, and A3 as large as enough.

    Table 4 shows that our proposed model has the best or the second-best performance in the most scenarios.

  3. 3.

    R stands for Reliability, and R1 is the percentage of simulations in which greater than half of the patients are assigned to any dose larger than the MTD. In this case, we measure the aspect of overly toxic. Similarly, R2 shows the percentage of simulations in which lower than 1/6 of the patients are assigned to the MTD. In this case, we measure the aspect of accuracy. A good design will has small results of R1 and R2.

    Although the results of the proposed model are not always the best or the second-best, the results are still small enough. Our new model outperforms and is reliable under the R criterion.

  4. 4.

    S expresses Safety, and S1 calculates the percentage of simulations in which patients are assigned to any dose higher than the MTD. S2 states the mean and standard deviation of the proportion of patients treated with any dose higher than the MTD. To design a good model, we expect the results of S1 and S2 to be small.

    Overall, it is true that the results of the proposed model under the S criterion are not as good as the previous results under the E and A criteria, but we still can conclude that our new model is safe.

Table 3 EARS for the power, logistic, hyperbolic tangent model and the new model.

Compared with TITE-CRM

We compare the final MTD selection with the power model (CRM), TITE-CRM27 and our proposed design. In simulation studies, we use the similar setting introduced by Yin et al.27. In all the CRM-type designs, the prespecified toxicity probabilities are 0.1, 0.2, 0.3, 0.4, 0.5, 0.6 for 6 dose levels, the target toxicity probability is 30% and 6 scenarios are listed in Table 4. We also include a scenario where all the available doses are toxic and a safety rule to stop early for the trial ( A safety rule is defined below). We present the percentage of trials which stop early in the “Early Stop” column. In Table 4, the first row of each scenario is the true toxicity probabilities. Row 2, 3, and 4 represent the final MTD selection of the CRM with power model, TITE-CRM, and the proposed new design. In scenario 1, the MTD is at the dose 4, our design performs best, and yields the highest percentage of MTD selection. In scenario 3, the MTD is at the last dose, our new design performs best, which is much better than the performance of the TITE-CRM. In scenario 4, the MTD is at the dose 4, our design performs best, and yields the highest percentage of MTD selection, and the TITE-CRM performs the worst. In scenarios 2 and 5, the percentages of the final MTD selection of our design are not the highest, yielding 52.5% and 60%, but are not much worse than the CRM and TITE-CRM. In scenario 6, all of the doses are very toxic, the CRM and TITE-CRM have the percentages of 24.2% and 25.7%, while our design can stop the trial early. As expected, in most cases, our design outperforms the CRM and TITE-CRM, and improves the selection of the MTD by approximately 7%, especially in scenario 3, our design improves the percentage by 47% over the TITE-CRM designs. Clearly, our design can resolve the delayed toxicity probability issue.

Safety rule Always treat the first patients with dose 1. If the average of the posterior probability of dose 1 and dose 2 given the response of the first patient is greater than the target \(p_T\), then terminate the trial because the doses may be overly toxic. Otherwise, continue the trial until reaching the maximum sample size.

Table 4 Simulation study comparing the results of CRM, TITE-CRM and the new model.

Conclusion

The main goal of phase I clinical trial is to correctly obtain the MTD as well as assign fewer patients to the overly toxic and ineffective dose levels. To achieve this goal and improve the performance of the classic CRM models, we propose a novel CRM design with a new dose-toxicity probability function. Results of the simulations exhibits that our proposed model is either the best or the second-best in all scenarios and is more robust. Then we use the EARS to evaluate the performance of the proposed model, and the results demonstrate that our new dose-finding method is more reliable, particularly we compare the performances of classic CRM, TITE-CRM and our design, and the simulation studies also show that most of the time, our design outperforms others. The contribution of our new model includes the information on both the toxicity and delayed responses of patients, but we do not need to classify the observation whether it is immediate or not. Involving the time, we can study some future works related to the delayed toxicity probability if we have real data. It is very challenge to deal with the missing data of the delayed responses in the real life due to the limited observation period. We often discard this information on the patients who did not have any toxic responses in the observation period but might experience DLTs beyond that period. But in this paper, our model automatically include this information and we can resolve this challenge part in the trial, and the simulation study looks good. However, the delayed toxic response may affect the accuracy of the MTD and will naturally be of interest to consider the delayed toxicity response when identifying the MTD. Our work on this problem is under progress and we hope to report our findings in the future.