The effects of releasing early results from ongoing clinical trials

Most trials do not release interim summaries on efficacy and toxicity of the experimental treatments being tested, with this information only released to the public after the trial has ended. While early release of clinical trial data to physicians and patients can inform enrollment decision making, it may also affect key operating characteristics of the trial, statistical validity and trial duration. We investigate the public release of early efficacy and toxicity results, during ongoing clinical studies, to better inform patients about their enrollment options. We use simulation models of phase II glioblastoma (GBM) clinical trials in which early efficacy and toxicity estimates are periodically released accordingly to a pre-specified protocol. Patients can use the reported interim efficacy and toxicity information, with the support of physicians, to decide which trial to enroll in. We describe potential effects on various operating characteristics, including the study duration, selection bias and power.


December 8, 2020
To investigate the implications of permeability, we consider two models for clinical research. The first, discussed in Section S1, is one in which independent controlled clinical trials follow a detailed protocol to disclose periodically data summaries. The second model, in Section S2, is a platform study with several experimental arms, in which the organization conducting the study periodically releases information, again following strict protocols. In this second setting, patients are allowed to choose from a catalog of randomization distributions: patients could, for example, accept the possibility of being randomized to the control and some of the experimental arms, while denying consent to receive other experimental treatments based on interim summaries.
We conduct simulations, using stylized assumptions, to compare permeable and impermeable research environments. The simulation study is tailored to a specific setting, phase II studies in Glioblastoma, with the goal of evaluating the potential consequences of more permeable designs, under realistic simulation parameters derived from a systematic review [5], (Table 1).

S1 Independent trials releasing efficacy information
We provide a detailed description of the models for permeable and impermeable research environments in which independent trials enroll participants from the same pool of patients. We shall consider fixed a time period of T months and focus on a single disease. Alternatively the time horizon could terminate with the discovery of the first effective treatment. The trials are assumed to have identical balanced randomized two-arm designs with overall survival endpoint and a planned enrollment of n patients. At time t = 0, a total of m ≥ 1 clinical trials evaluate treatments. These trials may be either enrolling patients or in the follow-up stage. At random times, 0 < t m+1 < t m+2 , . . . , new studies will open. The rate λ P models the average number of enrollment per month to active trials. Efficacy is tested using a log-rank test [2] with null hypotheses H j : HR j ≥ 1, where HR j is the hazard ratio between the experimental arm and the standard of care (SOC) in the j-th trial.

Permeable Environment
In the permeable environment, each open trial releases early data summaries at the end of every month = 1, 2, . . . , T . But the primary hypothesis H j of each trial j can only be rejected at completion of the trial. For this reason, testing H j (one of the primary purposes of the trial) does not involve adjustments for multiplicity (e.g. α spending-functions) to control the type I error rate. We may consider communicating various summaries of preliminary data, for example, point estimates of the hazard ratio HR j, , or the posterior probability of a positive treatment effect (PTE) π j, = Pr(HR j < 1| Data at month ). The first statistic conveys the magnitude of the effect, while the latter quantifies the degree of uncertainty on a positive effect. In our simulations, we compute the posterior probabilities π j, for each open trial with a normal prior on log(HR j ) having mean zero and variance 1/2. Complementary summaries could include the number of previously randomized patients and predictions of response or survival for each treatment.
It is difficult to evaluate which summaries are most interpretable or to predict how patients and physicians would respond to this information. We define a stochastic model for the patients' enrollment decisions, and evaluate the sensitivity of the simulation results with respect to the parameters of the decision model. In our simulations, we assume that if π j, is below a threshold θ L ∈ [0, 1], then patients do not enroll in trial j. Symmetrically, trials with π j, above a threshold θ U ∈ [θ L , 1] are selected with identical probabilities. The probability that a patient selects study j, during the period [ , +1), is p j, ∝ g θ (π j, ), where g θ (·) is monotone, and θ S ≥ 0. With θ S large and θ U close to one patients select the trial with the largest PTE.

Impermeable Environment
In the impermeable environment information on efficacy is released only after the end of the follow-up period of each trial. However, we assume that trials have a degree of adaptivity to early evidence of futility, as in most two-arm trials. Trials with posterior probability π j, below θ L are stopped for futility. The probability that patient i enrolls in trial j is constant across all open impermeable trials.
To simplify comparisons of permeable and impermeable environments we assume that permeable trials that do not recruit patients for a pre-specified period of time (π j, < θ L ) are closed.

Scenarios
The following simulation study is tailored to Phase II trials in Glioblastoma (GBM) over a period of 120 months. The parameters used in these simulations (see Table 1 in the manuscript) were selected from our systematic literature review of clinical trials in GBM during the last 15 years [5].
Each month, on average 53 GBM patients enroll in one of the open trials. Five studies are open at the beginning of the simulation period and 25 additional studies are opened at random time points during a 120 months period. The median survival for the control and non-effective treatments is 10 months compared to 14.3 months (HR = 0.7) for effective experimental arms. Each study enrolls up to 224 patients and, following standard protocols for survival analysis [3], final analyses are conducted after 144 events have been observed, approximately 12 months after the last enrollment in the study. With 10% type I error rate, HR = 0.7 and analyses after 144 events, each trial has approximately 80% power to detect the treatment effects.
In the permeable environment, the parameters for the patient decision model were set to (θ L , θ U , θ S ) = (0.05, 1, 3). To provide some interpretation of this choice, consider Panel (C) in Figure 1 in the manuscripts. After 20 months, there are 5 trials open; one of them evaluates a treatment showing early evidence of efficacy, while the others test treatments similar to the SOC. The posterior probability of a PTE in the effective trial is (average across simulations) around 0.7, and for the other trials it is close to the prior probability of 0.5. Under our model parameters, a patient would be around 3 times more likely to enroll in the effective trial than in any given ineffective trial. Figure 2 in the manuscript and Table S1 summarize the results of the simulation study. Permeable and impermeable environments have similar probabilities of a positive result for the first effective treatment (approximately 80% power) with identical type I error rates. In the permeable environment 9% of all simulated trials without effective experimental treatments are stopped early. Permeable trials with an effective experimental arm are stopped early in less than 1% of all simulations. Impermeable trials are stopped early for futility with similar frequencies as in permeable studies.
In our simulation, 3 out of 30 experimental arms are effective, and the first effective treatment is tested in study j eff , which ranges between the 6-th, up to the 27-th. Figure  2 in the manuscript shows the completion time of trial j eff . Panel (A) shows the average cumulative number of enrollments during time, starting from the onset of the study, and Panel (B) shows the distribution of the number of months to complete the enrollment across simulations. The illustrated enrollment period goes from the first randomization until the enrollment is stopped because either enrollment in study j is completed or the trial is stopped for futility. Table S2 reports selected operating characteristics of permeable trials, with early data summaries released every 1, 2, 3, 6, 9 or 12 months (columns of Table S2). When we reduce the frequency of the release of data summaries, the average enrollment period of trials with effective experimental arms increases, from 15.7 months (monthly release) to 19.6 months (release every 12 months), compared to 28 months for impermeable trials. Also, as expected, type I error rates and power are not affected by the frequency of the release of data summaries. Indeed, the null hypothesis (i.e. absence of positive treatment effects) is tested only at completion of the study.  Table S1: Selected operating characteristics of permeable and impermeable environments in GBM. Results are based on 5,000 simulations of a drug-development period of 10 years during which 30 two-arm trials evaluate new experimental treatments in GBM patients. We use (θ L , θ U , θ S ) = (0.05, 1, 3) to define the enrollment probabilities in (S3). Study j eff ∈ {6, · · · , 27} corresponds, in each simulation, to the first trial that evaluates an effective experimental treatment. Study j prev denotes the last trial that opens enrollment before study j eff is activated. Study j next corresponds to the first study that evaluates a non effective experimental treatment and opens enrollment after j eff . Studies j prev , j eff and j next are random and vary across simulations. P Stop indicates the probability of stopping the trial early.  Table S2: Selected operating characteristics of permeable environments in GBM when early data summaries are released every 1, 2, 3, 6, 9 or 12 months. Results are based on 1,000 simulations of a drug-development period of 10 years during which 30 two-arm trials evaluate new experimental treatments in GBM patients. We use (θ L , θ U , θ S ) = (0.05, 1, 3) to define enrollment probabilities in (S3). P Stop indicates the probability of stopping the trial early. Values on parenthesis indicate standard errors. Study j eff ∈ {6, · · · , 27} corresponds, in each simulation, to the first trial that evaluates an effective experimental treatment. Study j prev denotes the last trial that opens enrollment before study j eff is activated. Study j next corresponds to the first study that evaluates a non effective experimental treatment and opens enrollment after j eff . Studies j prev , j eff and j next are random and vary across simulations.

S1.1 Sensitivity Analyses
The stylized assumptions of the previous section do not represent some aspects of a permeable environment, and it is important to understand how departures from the model affect the operating characteristics.

1) Interim results can influence the overall enrollment rate
Many trials test drugs that are already indicated for other diseases and therefore commercially available. This is common in oncology, where a drug approved for a cancer type may be later approved for other malignancies [4]. In this situation, it could be possible for a patient to obtain the drug off-label, instead of enrolling in a randomized study. It would not be surprising if the proportion of patients opting to receive a treatment off-label increased when evidence of efficacy from a trial becomes is released in a permeable environment.
In our sensitivity analysis, we assume a set of treatments J can be obtained off-label, and we incorporate patients' reactions into our model, with a time-varying overall enrollment rate . To be precise, λ P ( ) is the rate describing the enrollment of patients in all trials in the time period [ , + 1). Here, λ P is the constant rate used in our basic model, p OL ∈ [0, 1] represents a proportion of patients that obtain off-label treatments when experimental treatments under study show promising results, and θ OL ∈ [0, 1] is a threshold for the posterior probability of a positive effect above which these patients opt for the off-label treatment.
Panel (A) in Figure S1 illustrates the sensitivity of the time to complete enrollment with respect to this perturbation. It shows the mean time for three studies (j prev , j eff , and j next ) to complete enrollment (y-axis) when a proportion of patients p OL (x-axis) in [0, 0.6] prefers to obtain drugs off-label instead of enrolling into active trials if the posterior probability of a positive effect, for any of the treatments j, becomes large (π j, > 0.9). The mean time to complete enrollment for study j eff increases substantially, from 15.6 months, to 22.5, 32.6 and 57.0 months when p OL = 15%, 30% and 50%. For studies testing ineffective treatments the average length of enrollment increases similarly.

2) Misreported interim results
We consider the impact of misleading information which is inconsistent with the data generated in ongoing trials. This reflects possible incentives of stakeholders to misreport the probability of a positive treatment effect in order to secure a higher enrollment rate. We assume that in one study (j = 6) which tests an ineffective treatment, the investigators misreport π j, and publish the value min(1, π j, + δ) with constant misreporting δ ≥ 0 during the trial.
Panel (B) in Figure S1 shows the average cumulative number of enrollments in the study with misreporting over time. We compare the model without misreporting (δ = 0) to two different levels of misreporting, δ = 0.15 and 0.3. Even with moderate misreporting (δ = 0.15), the number of patients enrolled in study 6 (testing an ineffective treatment) increases substantially.

3) Interim results determine population trends
We consider a scenario where latent variables define subpopulations. Patients in different subpopulations have different prognostic profiles, and they also react differently to interim results. This could induce distinct population trends in each trial and compromise the validity or generalizability of the trial results. For instance, patients with higher educational levels might be more reactive to interim results; which would lead to trials with early promising data to be enriched with this group of patients. If the educational level is also associated with prognosis or with treatments' effects, then the effect sizes across trials may not be comparable. It is worth noting that the populations enrolling in distinct trials may also be different in an impermeable setting.
We assume that there are two groups of patients, group 1 and group 2, with good and poor prognostic profiles respectively. Experimental treatments can have identical or distinct effects in these two groups. Additionally, patients in group 1 select trials accordingly to expression (S3), while patients in group 2 select their trial randomly with identical probabilities across all open studies. Let p 1 be the proportion of patients in group 1, and let (µ 1 , µ 2 ) be the mean survival time under standard of care in groups 1 and 2. We consider three simulation scenarios, with p 1 = 0.2, 0.4 or 0.6, and µ 1 = 40.1, 29.4 or 18.7. In each scenario we set µ 2 = 8. In our simulations, superior treatments have identical effects (within stratum HR = 0.7) in groups 1 and 2.
Panel (C) in Figure S1 shows the proportion, across simulations, of patients in groups 1 and 2 during the enrollment period of the trial j eff , from the first enrollments until the last (n = 240) one. The sensitivity analysis shows that patient subpopulations could be significantly overrepresented or underrepresented in permeable trials. Variations in the composition of the enrolled patients during time can lead to biased treatment effect estimates when the effects are different across groups.

4) Dropouts due to interim results
In the permeable environment participants in study j might leave the trial when π j, becomes small. We consider again two groups (group 1 and group 2) with good and poor prognoses and incorporate distinct drop-out propensities for patients in these two groups. Patients that enrolled into trial j have positive probability to leave the trial when the posterior probability π j, becomes smaller than a fixed threshold. The propensity for drop-out can differ across groups g = 1, 2.
Panel (D) of Figure S1 shows, for study j prev , bias in the estimated median survival. The median survival time was estimated using Kaplan-Meier estimates, assuming non informative censoring and including drop-out decisions as censoring events. We assumed identical prevalences (p 1 = 0.5) of patients with good and poor prognoses (group 1 and 2), and median survival times of 17 and 3 months in these two groups for the SOC and ineffective treatments. We consider two cases. In the first one (brown curve), if π j, become smaller than the threshold, then each patient in group 1 drops out with probability equal to 0.05, 0.1, · · · , 0.9 (x-axis) and patients in group 2 drop-out with probability 0.05. Symmetrically, in the second case (black curve) the drop-out probability equals 0.05 for patients in group 1 and 0.05, 0.1, · · · , 0.9 (x-axis) for patients in group 2. When patients in group 1 have high drop-out propensity the median survival tends to be underestimated. Similarly, survival estimates tend to be overestimated when the drop-out probability is larger for patients with poor prognoses than for patients with good prognosis.

5) Potential misinterpretation of data summaries
Data summaries, such as posterior probabilities, p-values and confidence intervals, can be misinterpreted by patients and physicians. For example, a reported p-value equal to 0.2 may be interpreted as a 20% probability that the experimental treatment does not improve primary outcomes. Moreover, patients might not be familiar with uncertainty summaries (e.g. confidence intervals or probabilities). For some of them the propensity to enroll into a clinical trial could be highly reduced after the study release data summaries. The patient could be open to enroll into the trial in absence of early data summaries (impermeable trial), but could refuse enrollment if early summaries were released, unless these included strong evidence (say a posterior probability > 90%) of clinically relevant treatment effects.
We consider a scenarios where a group of patients consider the experimental treatment in study j to be ineffective if π j, < π IE (in our simulations π IE = 0.7, 0.8 or 0.9). These patients don't enroll into study j when π j, < π IE . Figure S3 shows the average duration of the permeable trial j prev when the size p 0 of this group of patients varies between 0 and 0.5.

6) Heterogeneous enrollment rates across trials
In the permeable and impermeable model we assumed identical enrollment rates across trials when efficacy information is absent or when it coincides across trials (S3). In practice, there are significant differences in enrollment rates across trials for a variety of reasons independent of interim results. We evaluate departures from this assumption through study-specific parameters (γ j ) j≥1 that modify the enrollment rates. The probability that a patient during the interval [ , + 1) enrolls in trial j in the permeable environment becomes proportional to γ j × g(π j, ). Before each simulation of a permeable environment, we generate trial specific parameters γ j . The parameter γ j then remains fixed during the trial j, and the overall enrollment rate λ P , considering all open trials, remains constant.
We considered different degrees of variability of these baseline parameters γ j across open trials. In our simulations we did not observe substantial changes in the operating characteristics of the permeable environment beyond the expected correlation between the parameters γ j 's and the corresponding trial-specific times to complete accrual.

7) Sensitivity to the parameters of the decision model
Simulations of clinical trials in a permeable environment require a model for the effects of early data summaries on the enrollment decisions. In our case, this model is defined by equation (S3). It may be possible to justify qualitative characteristics of the model, such as monotonicity of the probability that a patient selects a specific trial with respect to interpretable summary statistics. However, there is little knowledge about key parameters, for example θ S in model (S3), which regulate how patients react to small variations of interim summary statistics, and the degree of homogeneity across patients of preferences and decisions. Additionally, it is likely that appropriate parameters for a specific disease, say a life threatening condition like Glioblastoma, might be unrealistic for a different pathology. Figure S2 illustrates variations in the operating characteristics when we consider different parameterizations θ in model (S3). Panel (A) illustrates sensitivity to the choice of θ L on the power and probability of early termination of a study. Recall that in the simulation model trials can be stopped for insufficient accrual. As expected the probability of early termination of trials increases with θ L . The panel illustrates also a monotone relation between θ L and power (for j eff ) or type I error probability (for j prev ). Panel (B) of Figure S2 shows the average trial duration (for trials j eff and j prev as defined earlier) for different values of θ S . As expected, the higher θ S  Figure S1: Sensitivity analysis in the permeable environment. Panel (A) shows the mean time to complete enrollment for a range of reduction parameters (x-axis) of the overall enrollment that are activated when max active j π ,j > 0.8. Panel (B) shows the average cumulative number of enrollments for the study j = 6 (testing an ineffective treatment) during the accrual period with misreporting of summary statistics, π ,j → min(1, π ,j + δ bias ), for δ bias = 0, 0.15, 0.3. Panel (C) shows the average proportion of enrolled patients in group 1 (y-axis) on study j eff from the onset of the trial up to the i-th enrollment (i = 1, . . . , 244) (x-axis). The proportion of patients in group 1 in the population equals p 1 = 0.2, 0.4 or 0.8. Panel (D) illustrates the average bias, i.e. the mean difference between estimated and true median survival time, for the experimental arm in trial j prev when the drop-out probability of patients in group 1 (group 2) equals p ∈ [0.05, 0.9] (x-axis). the larger the difference becomes between the expected duration of competing trials testing effective and ineffective treatments. In this panel we assumed that arm 6 tests an ineffective treatment and compare it with j eff . S12 π IE = 0.7 π IE = 0.8 π IE = 0.9 Figure S3: Sensitivity analysis: potential misinterpretation of early data summaries during permeable trials. The panel shows the average study duration (y-axis) of the permeable trial j prev when 0 ≤ p 0 ≤ 0.5.

S2 Permeability in Platform Trials
A platform trial is a multi-arm study that, by design, allows investigators to add and remove experimental arms as the trial progresses [1,6,7]. Arms can be added to the study when new treatments become available, and the number of arms may change in time. Experimental arms within platform designs can be compared during the study based on the available data. Platforms can potentially provide interpretable comparisons to clinicians or patients, which in turn can guide individual decisions.
We consider the following platform design. Patients are randomized to one of the active experimental arms or the control arm. In contrast to conventional platform trials each patient selects, before randomization, a list of arms and is then randomized to one of these arms with identical probabilities. Treatments on these lists are selected based on available information and personal preferences. The only requirements are the inclusion of the control arm and selection of at least one experimental arm.
In the permeable environment the available information influences patient decisions, with effects on the individual list of arms. In our model, during the period [ , + 1) each patient i selects the experimental treatment j with probability for each open experimental arm j. Here the arm j * with the largest PTE probability is always included in the list, p j * , = 1, whenever the posterior probability of a PTE is larger than θ L . If all posterior probabilities π j, of active experimental arms are smaller than θ L , and therefore the patient's list does not include an experimental arm, the patient will not enroll in the platform study. As before, experimental arms that do not recruit patients for a pre-specified number of months are closed.
In the impermeable environment the individual list of treatments selected by patients does not relate to interim information. Patient i includes in the individual list an active arm j with probability p IP ∈ [0, 1], which is constant across arms. Only patients with lists that include one or more experimental arms in addition to the control are enrolled into the study. For the impermeable platform trial we include an early stopping rule for futility: to simplify comparisons with the permeable setting, if the probability π j, for arm j falls below the threshold θ L then arm j will be closed early.
To avoid selection bias, we apply the same strategy used in [6] and compare the outcome data of the experimental arm j only to those patients on the control arm that included arm j in their list of experimental treatments. These are the patients that could have been randomized to arm j with positive probability. We adopt this approach for permeable and in impermeable platforms. The method (restricted comparisons) combined with balanced randomization (i.e. identical randomization probabilities for the control arm and the experimental treatments that the patient selects), is robust with respect to potential variations of the populations [6] enrolled by the experimental and control arms during the platform study.
We evaluate the operating characteristics of permeable platform trials in a simulation study using the parameters of Table 1, discussed previously, which are based on a literature review of Glioblastoma trials. The platform starts with 5 experimental arms and up to 25 arms are added during a period of 10 years. Enrollment to an experimental arm j stopped if 230 patients, that selected arm j on their list, have been enrolled either to the experimental arm or the control arm.
For the permeable platform, we used (θ L , θ S ) = (0.05, 4) to define treatment selection probabilities (S2). With these parameter values patients on average include 55% of the active experimental arms in their randomization lists. For comparison purposes we assumed, similarly, that patients in the impermeable platform study include each available open arm within the randomization list with fixed probabilities equal to 0.55. Table S3 summarizes selected operating characteristics of the platform trial designs. Permeable and impermeable platform trials have similar type I and II error rates, and a similar proportion of ineffective experimental treatments are stopped early for futility. Platform trials use a common control arm to evaluate experimental treatments in a single study. This, compared to conventional two-arm randomized clinical trial designs, reduces considerably the average time and the number of patients required for testing treatments [6]. Consequently, compared to two-arm trial environments, both platform environmentspermeable and impermeable -reduce the average time to complete enrollment.
The simulation study for platform trials confirmed a relevant result which we obtained for permeable two-arm trials, indeed with permeable platform trials the time to complete enrollment for the first effective experimental treatment (indicated by j eff ) is shorter compared to ineffective treatments.  Table S3: Permeable and impermeable platform trials in Glioblastoma. Results are based on 5,000 simulations of a drug-development period of 10 years during which 30 experimental treatments are evaluated in a platform trial. We use (θ L , θ S ) = (0.05, 4) to define patients enrollment decisions (S2). Arm j eff ∈ {6, · · · , 27} corresponds, in each simulation, to the first effective experimental treatment. Arm j prev denotes the last experimental treatment that was added before arm j eff was activated. Arm j next corresponds to the first ineffective experimental treatment that opens enrollment after arm j eff . Arms j prev , j eff and j next are random and vary across simulations.

Permeable environment
The probability p j, that a patient selects study j, during month , is an increasing function of the the posterior probability of a positive treatment effect π j, = Pr(HR j < 1| Data at month ) where HR j is the hazard-ratio between the experimental and control arm in study j. We compute the posterior probabilities π j, for each open trial with a normal prior distribution for log(HR j ) ∼ N (0, 1/2).
In our simulations, we assume that if π j, is below a threshold θ L ∈ [0, 1], then patients do not enroll in trial j. Symmetrically, trials with π j, above a threshold θ U ∈ [θ L , 1] are selected with identical probabilities. The probability that a patient selects study j, during the period [ , + 1), is p j, ∝ g θ (π j, ), where and θ S ≥ 0. With θ S large and θ U close to one patients select the trial with the largest PTE. S19