A quantitative approach to the spread of variance in translational research using Monte Carlo simulation

The translation of promising preclinical research into successful trials often fails. One contributing factor is the “Princess and the Pea” problem, which refers to how an initially significant effect size dissipates as research transitions to more complex systems. This work aimed to quantify the effects of spreading variability on sample size requirements. Sample size estimates were performed by Monte Carlo simulation. To simulate the process of progressing from preclinical to clinical studies, nested sigmoidal dose–response transformations with modifiable input parameter variability were used. The results demonstrated that adding variabilty to the dose–response parameters substantially increases sample size requirements compared to standared calculations. Increasing the number of consecutive studies further increases the sample size. These results quantitatively demonstrate how the spread of variability in translational research, which is not typically accounted for, can result in drastic increases in the sample size required to maintain a desired study power.

www.nature.com/scientificreports/ There are several reasons why variability increases as research transitions from molecular to animal to human studies. For example, a novel therapeutic agent can be first shown to bind its receptor under highly controlled chemical reaction conditions. Variability is here at its lowest. The move to cell culture, where conditions remain tightly controlled, introduces more variability as a reflection of the many ongoing metabolic reactions occurring within a living cell. Animal studies carry even higher levels of variability. Despite the standard of comparing inbred animals of the same sex and age, there are many factors that add variability to animal studies. Genetic variability has been shown to persist even in inbred animals 5 , experimental animals demonstrate epigenetic differences 6,7 , and the impact of husbandry, housing, interactions with experimenters, pheromones, and the animal's microbiome all contribute to the variability of animal models 8,9 . As the drug development process moves to humans, researchers encounter even more variability. Human subjects are rarely genetically identical, and epigenetic differences between humans are effectively impossible to avoid. Moreover, clinical trial participants vary in the time they take to become symptomatic for a given condition, when they seek treatment, their compliance with the treatment, the degree of placebo effect, and their previous medical history 1 .
The vast majority of translational studies rely on determining if the mean difference between groups, divided by some measure of variability, produces a value that is statistically significant. The observed spread of variance as more complex chemical and biological systems are studied causes this ratio to decrease, essentially reducing the effect relative to the variability. Even if the effect size stays the same, an increased variance makes it increasingly challenging to detect differences between experimental groups. If one adds distributions that have specific variances, the variance of the sum will increase. The variance of the sum or difference of independent random variables is equal to the sum of its variances, i.e., variation will increase with addition or subtraction of these distributions. In other words, the effects of progressive experiments, from simple reactions to more complex biological systems, each with its own variability, will increase the overall variability of a study. This is at the heart of the Princess and the Pea problem.
To quantify the effects of the Princess and the Pea problem in the biomedical arena we performed a series of Monte Carlo simulations. Sigmoidal dose-response curves were selected as example transfer functions, each adding variability based on their parameters, and used to calculate study sample size requirements. This approach was used to quantify the impact of introducing different amounts of variability on a single experiment and series of experiments, making possible the ability to use the effect size in an animal model to an estimated size for a clinical trial. In some cases, a realistic degree of variability in a series of simulated experiments could result in a situation where a clinical trial is impossible because of an impractically large study size needed to detect a significant difference between groups.

Results
The following results quantify the impact of simulating consecutive experiments (Fig. 1) and/or adding variability to the parameters of each experiment on sample size requirements and, correspondingly, study feasibility. Each Level, described below, represents a study along the pathway from pre-clinical work to late-phase clinical trials.
The simulated experiments are based on nested dose-response transformations, where each level represents an additional dose response transformation. The following terminology has been used: Level 0: no dose response transformation; Level 1: one dose-response transformation (i.e. one study); Level 2: two consecutive dose-response transformations (i.e. one study building on the work of a previous study); Level 3: three consecutive dose-response transformations; Level 4: four consecutive dose-response transformations. For the consecutive dose-response transformations, the response output from one transformation was used as the dose input data for the subsequent transformation, as outlined in Fig. 1.
Power to detect differences in two normal distributions transformed with a dose-response curve. We first established the validity of the Monte Carlo simulation for comparison of two samples by assessing the power calculations from the simulation versus those calculated by standard methods based on the non-central t distribution. To do this, two vectors of n normally distributed values differing in their means were generated, simulating a single experiment. An unpaired equal-variance t-test was performed between the two vectors, and the trial was recorded as significant if the p value was less than the predetermined alpha (0.05). This process was repeated 10,000 times, based on initial simulations demonstrating significantly more variable responses with 1000 runs but low variability at 10,000 or 100,000 runs. The n required to obtain a specific power (i.e. proportion of trials that showed a significant difference) for the predetermined difference of means was determined by adjusting n and repeating the simulation. Finally, the simulated power was compared to the power calculated using the MATLAB sampsizepwr function for a t-test. The results of these comparisons demonstrated that the simulation replicated the same relationship among study power, sample size, and group mean difference as with MATLAB functions.
The process of transforming data through dose-response functions, even without additional variability in the parameters, can be shown to increase the noisiness or spread of the data (Fig. 2). When an arbitrary, normallydistributed drug dose concentration is run through 1, 2, 3, or 4 dose-response transformations, there is a gradual widening of the initially normal distribution.
On the basis of this observation, a dose-response curve transformation was added to the simulation and the effect on power and sample size assessed. This constituted Level 1 (equivalent to the blue distribution in Fig. 2). To do this, two new vectors for comparison of groups were produced by transforming the dose parameter input vectors (equivalent to "doses"), each drawn from normal distributions differing in means, and applying a Hill dose-response function. The simulations were run as above, again using various parameters to assess for their effect on power. For example, in Fig. 3A  www.nature.com/scientificreports/ (sigma) to 0.2, EC 50 to 0.5, slope to 1, maximal response to 1, and minimal response to 0. Using these parameters, there was minimal difference between the power analysis based on the transformed (dose response) data and that based on the untransformed data. For all data, theoretical power and theoretical sample size refer to values calculated using standard MATLAB methods, as opposed to using the Monte Carlo simulation. The largest discrepancy for those simulations was where n ranged from 38 to 67. At an n of 100 the simulation power closely approximated the theoretical power, with both power estimates keeping with a desired study power (power of 0.933 vs. 0.941, respectively). These data demonstrate that the simulation replicates MATLAB t-test power calculations when the input data are transformed by the dose-response relationship (Level 1). Above an n of 67, the greater the sample size, the closer the simulation is to providing the same power estimates as a standard MATLAB power function. Third, the effect of a second level of dose-response transformation was simulated, i.e., the output of one dose-response transformation was used as input for a second transformation (Fig. 1, Level 2). As with Level 1, there was no substantial change in power and sample size calculations for Level 2 when compared to power and sample size calculated by sampsizepwr in MATLAB when no variability was added to the dose-response parameters. For example, at a group mean difference of 0.11, a SD of 0.1 and a slope variance of 0, the simulated sample size and the theoretical sample size are both 15 (Table 1; Fig. 3B, varSlope = 0 curve). Only when the group mean difference is very small (e.g., 0.01) does stepping up a Level (adding an additional dose-response transformation) without adding variability result in an appreciable difference between the two estimates, with a simulated sample size of 1643 and a theoretical sample size of 1571 (Table 1).
Adding variability to Level 2 dose-response curve parameters reduces power and increases the required sample size. The above simulations took into account the variability of the input distributions for each parameter but did not add variability to the actual transformations. It is therefore not surprising that a fixed transformation based on dose-response curves might not greatly affect power because the inputs still map directly to the outputs and it is well-known that the t-test is robust to several types on non-normal www.nature.com/scientificreports/ distributions 10 . However, the rationale for this study is that translational research proceeds through a series of experiments, each adding variability. We hypothesized that power calculations based on the non-central t distribution that are typically used to guide investigators in sample size selection do not adequately account for the variability added by each of a sequence of experimental studies. To test this hypothesis, we performed simulations where variability was added to the transformations by introducing variability to the parameters of the dose-response curve.
Slope in dose-response curves. The slope parameter was varied by adding values from a normal distribution centered on 0 with a SD of 0.1, multiplied by the slope variability parameter. Introducing a small degree of variability to the slope parameter (e.g. 0.1) to the Level 2 simulation resulted in little appreciable change to the curve as compared to the baseline curve without any variability (Table 1). At a small effect size (difference between group means of 0.01 and sigma of 0.1), adding the variability of 0.1 to the slope parameter only increases the required sample size from 1571 to 1666, a 1.06x increase. Adding a variability of 0.5 to the slope parameter also at a group mean difference of 0.01 increases the sample size from 1571 to 2298, a 1.46x increase. Further increasing the slope variability to 1.0 changes the required sample size to 5743, a 3.65x increase. Similarly, when the group mean difference is increased to 0.11, a slope variability of 0.5 resulted in a 1.46x increase in sample size from 15 to 22, while a slope variability of 1.0 resulted in a 4.4x increase in sample size to 66. These results demonstrate that as variability is added to the slope parameter of the dose-response function the sample size requirements increase, at a given power, alpha, group mean difference, and sigma. Changing sigma has a similar but reciprocal impact on sample size, as compared to changing the group mean difference (Fig. 3C), because the effect size is determined by the ratio of the two. At a sigma of 0.11, a slope variability of 1 increases the required sample size from 21 to 87, a 4.14x increase. With broader distributions in the simulated groups, e.g., sigma of 0.51, a slope variability of increases the sample size from 411 to 604, and a slope variability of 1 increases the sample size to 1838, a 4.47x increase. Increases in either slope variability or population sigma result in increased sample size requirements, and the larger the slope variability, the more apparent the effect of changing sigma on sample size.
EC 50 in dose-response curves. When variability was added to the EC 50 parameter in the Level 2 simulation, there was a similar relationship as when variability was added to the slope parameter (Table 1; Fig. 4A). That is, the smaller the difference between group means (e.g. 0.01), the more substantial the increase in the required sample size as a result of adding variability to the dose-response relationship. Of the four parameters, increasing EC 50 variability had the largest impact on sample size, with more than double the theoretical sample size required at a variability of 0.1 and > 10,000 at a variability of 0.5 or greater (Table 1). www.nature.com/scientificreports/ Maximal and minimal responses in dose-response curves. The effect on sample size of adding variance to the minimal response and maximal response parameters yielded largely the same results. As with the EC 50 parameter, at a small group mean difference (0.01), the n required to detect an effect was > 10,000 with a maximal or minimal response variability of 0.5 or greater (Table 1; Fig. 4B).
Increasing the number of dose-response transformations from Level 1 to Level 3 amplifies the effect of variability on the relationship between power and sample size. Adding a third dose-response transformation (Level 3) further increases the required sample size for a given power (Table 1). Figure 4C depicts the effects of adding variability to the maximal response parameter, but similar results were found with all four dose-response curve parameters, again with the EC50 parameter having the largest impact on sample size requirements.
Maximal response in dose-response curves. At a group mean difference of 0.01 and a SD of 0.1, increasing maximal response variability by a factor of 0.1 to all dose-response functions included in Level 3 resulted in a 1.38x increase in the sample size calculations from 1571 to 2181 (Fig. 4C). When the difference between group means is increased to 0.11, changing the maximal response parameter variability to 0.1, 0.5, or 1 resulted in sample  (2)] on sample size for Level 2 simulations at various group mean differences. The difference between the simulation sample size estimates and the theoretical sample size increases at a given group mean difference when greater variability is added to the slope parameter by increasing the varSlope value, with alpha and power held constant. Note that the curves for "Theoretical n, " "varSlope = 0, " and "varSlope = 0. www.nature.com/scientificreports/ sizes of 23, 267, and 2159, respectively (Table 1). This corresponds to sample sizes that are 1.5x, 18x, and 144x greater than the theoretical n of 15. Compared to the level 2 simulation, the most notable increase in sample size requirements at level 3 are seen when comparing n at the maximum single parameter variability of 1 (Table 1; varMaxResp = 1, Fig. 3B,C).
Introducing variability to multiple dose-response curve parameters for Levels 1 through 4 leads to considerably higher sample size requirements. The final step in the simulation was to assess the combined effects of adding variability to all four parameters for Levels 1, 2, 3, and 4 ( Fig. 5A). When minor (e.g., 0.1) variability is added to all four parameters for the dose-response transformations included in Level 4, the effects on sample size are comparable to adding a ten-fold higher degree of variability (e.g., 1.0) to a single curve parameter over fewer transformations (Levels 1-3). For example, for Level 4 at a group mean difference of 0.19 and adding a variability of 0.1 to all of the parameters, the required sample size changes from 7 to 32. In comparison, for the same group mean difference (0.19) but at Level 2, the variability of the slope parameter must be set 10 x higher, to 1.0, for the sample size to increase from 7 to 37 (Fig. 3B). These data support the previous conclusion that as dose-response transformations are added to the simulation, the sample size requirements increase to maintain the same power and alpha. At a group mean difference of 0.11 the theoretical sample size is 15. As the simulation complexity increases progressively from Level 1 to Level 2 to Level 3 to Level 4, with a variability of 0.1 applied to all four parameters, the sample sizes increase from 31 to 46 to 59 to 73. These values correspond to a 2.1x, 3.1x, 3.9x, and 4.9x increase in the required sample size compared to the theoretical sample size. Similar results were seen when changing sigma (Fig. 5B).
Increasing the variability in any one dose-response curve parameter with minimal background variability amongst the other parameters was also shown to substantially impact sample size requirements. When there is an existing baseline variability of 0.1 in the EC 50 , maximal response, and minimal response parameters, the impact of adding variability to the slope parameter is compounded. For Level 2, when a variability of 0.5 is added to the slope parameter alone, at a group mean difference of 0.11 there is a 1.5x increase from the theoretical sample size  (2), which serves to add variability to better model real-world experimental data. The relative increase in sample size requirements from standard calculations (theoretical n) are presented in parentheses for all sample sizes calculated using the simulation. www.nature.com/scientificreports/ of 15 to 22 (Table 1). In comparison, for Level 2 with a background variability of 0.1 in the other three parameters (Fig. 5C), increasing the slope variability to 0.5 results in a 4.3x increase in the required sample size (n = 73) at a comparable group mean difference of 0.1. Adding another two dose-response transformations to the simulation (i.e. Level 4) at a group mean difference of 0.1 further increases the sample size requirements to 152, an 8.9x increase as compared to the theoretical n. Finally, a counterintuitive finding depicted in Fig. 5 is that the effect of nesting on spreading of experimental data becomes lower as the depth of nesting increases. For example, with a variability of 0.1 added to all four parameters, there is a 112% increase in sample size requirements moving from theoretical sample size calculations to Level 1 calculations, at a group mean difference of 0.1. With adding subsequent levels of nesting, even with the propagation of parameter variability, the relative increase becomes smaller as each higher Level is reached, e.g. there is a 50% increase moving from Level 1 to Level 2, a 28% increase moving from Level 2 to Level 3, and 25% increase moving from Level 3 to Level 4.

Discussion
The Princess and the Pea problem gives a name to a specific challenge of translational research, which while well-understood, has not been quantified until now. The pathway from preclinical research to clinical trials includes an inherent spread of experimental variability. Despite a conceptual recognition of this accumulation of variability, the standard methods used by investigators to predict an appropriate sample size do not incorporate this spread of experimental data. Moreover, pure analytical or theoretical approaches to address the Princess and the Pea problem are limited by the fact that most statistical approaches, from which a theoretical analysis would be generated, almost inevitably are based on data that violates necessary assumptions 11 . A Monte Carlo simulation was therefore used to take advantage of a typical sigmoidal dose-response relationship while also accounting for potential assumption violations to best quantify the effects of experimental variability on power and sample size calculations.
As seen in Fig. 2, the process of transforming study data through nested dose-response functions, akin to transitioning through sequential studies in a drug development program, increases the spread of variance in the experimental data, making it harder to identify a clinical effect.
When small amounts of "real-world" variability are added to the input parameters of the simulated experiments, there is even greater widening of the distributions. In terms of study design and planning, these results suggest that accounting for the spread of variability in translational research necessitates a drastic increase in sample sizes in order to maintain a desired power and alpha. At Level 2, used to simulate two consecutive studies, the variability for a single parameter must be set high (e.g., an increase in variability from 0 to 1) to result in an appreciable change in sample size requirements. However, as more dose-response curve transformations are introduced into the simulation (modeling more steps between preclinical research and clinical trials), less dose-response parameter variability is needed to see similar increases in sample size requirements. When minor variability is added to all four curve parameters for the Level 4 simulation (approximating the variability  www.nature.com/scientificreports/ encountered in the drug development process), there is a drastic increase in sample size requirements, e.g. a 4.9x increase in sample size with a variability of 0.11 in the four parameters. When a larger variability is added to a single parameter, with a background of minor variability amongst other parameters, there is again a substantial jump in the required sample size when compared to standard theoretical power and sample size calculations. Between the four parameters, variability in EC 50 appears to have the greatest impact on sample size, followed by maximal and minimal response, and then slope. This simulation quantifies just how large sample size requirements become for a desired power when the accumulation of variability that occurs during translational research programs is taken into account. For example, even at Level 2 of 4, with a group mean difference of 0.01 and SD of 0.1, introducing a variability of 0.5 to the EC 50 parameter alone increases the sample size requirements from a reasonable target of 1571 subjects to a potentially prohibitive sample size greater than 10,000 (Table 1). Even with larger effect sizes and correspondingly smaller initial sample size requirements, small changes in dose-response curve input and output variability substantially impact sample size and the ability to detect an effect. Power calculations based on the non-central t distribution are often used to guide investigators in sample size selections, but do not adequately account for the variability www.nature.com/scientificreports/ accumulated over a sequence of experimental studies. If the sample size used for a trial based on standard power calculations is 15 while this simulation finds a minimum sample size of 73 at Level 4 (Fig. 5A), it should come as no surprise that researchers frequently fail to detect a significant effect at the clinical trial level (Fig. 5A). In reality, trials are typically limited to hundreds (e.g., Phase 2) to a few thousand (e.g., Phase 3) of participants. Working within the bounds of a feasible clinical trial, these results suggest that the ability to detect a significant difference between experimental groups is reduced by the inevitable spread of variance in translational research. Clinical trials for rare diseases exemplify the importance of selecting adequate and feasible sample size targets. Given the inherent rarity of these diseases, setting recruitment targets that are sufficient to detect a difference between experimental groups while also feasible is of central importance 12 . When hundreds, rather than thousands, of patients are recruited for Phase 3 trials, the aforementioned spread of variance can easily mask the effect of a therapy and result in a failed trial. In this context, increasing sample size numbers to overcome variability is frequently not feasible. Even in larger trials, where thousands of patients can be recruited, this same challenge can lead to the dissipation of a clinically meaningful effect within the noise of study variability. When planning a clinical trial, investigators can typically predict the variability of study measurements but not of the effect size. With this simulation, the measured effect size in an animal model can effectively be translated to a predicted effect size for a clinical trial by taking into account the variability accumulated with sequential experimental steps. For instance, if a study drug is found to protect 50% of retinal ganglion cells (RGCs) from dying in a rat model of glaucoma, investigators can predict that it also protects 50% of human RGCs from dying. The effect size, however, will almost inevitably be reduced at the human trial level due to the greater variability encountered in human populations. Rather than relying on assumptions that a preclinical effect size (i.e. effect size in an animal model) will remain the same throughout the translational process, investigators can use tools derived from the simulations outlined in this study to better predict the effect size of a given drug once at the clinical trial stage.
The results of these simulations also suggest some approaches to prevent translational failure as a result of the Princess and The Pea problem. We propose that in order to design successful translational studies of novel therapeutic classes, investigators should aim for more robust preclinical effect sizes than routine power analyses might suggest. Smaller preclinical effect sizes may not warrant progression to larger trials once investigators account for the variability inherent to the study design and to the therapeutic effect. Given that effect size depends on the ratio between the group mean difference and the population standard deviation, investigators must simultaneously seek larger differences between experimental groups and smaller variation in their data to better design translational studies.
Scrupulous control of sources of variability throughout the translational research process can also help mitigate some of these effects. From the way investigators collect data and record measurements to the way they culture cell lines and interact with experimental animals, investigators can actively seek ways of minimizing variability within their experimental systems, recognizing how this variability will propagate, as simulated in this study by carrying forward variability through each Level. The actual effect of a therapy will itself have a degree of variability that cannot be directly controlled by the careful planning and execution of studies. Some spreading of variance is therefore an unavoidable consequence of developing new therapies and of transitioning between experimental systems.
Given that each sequential dose-response transformation, or Level, (i.e. "mattress") resulted in the spreading of a given effect size (i.e. "pea") due to variability, we also suggest designing translational drug development programs with fewer steps, where possible. If the number of steps between initial preclinical biochemical studies and eventual human clinical trials can be reduced without compromising study integrity, it may be possible to more accurately predict the propagation of the true effect size and calculate accurate sample size requirements.
Another helpful tool to mitigate the effect of spreading experimental variability is to use biomarkers throughout the translational research process. Biomarkers play an important role in bridging laboratory and human studies. For instance, biomarkers can provide subject-specific biologic data on variations that influence a given drug's efficacy and toxicity 13 . Assessing the same biomarker at each experimental level can help to control for the loss of the ability to detect an effect size given that the measured outcome often varies at each stage (e.g., fraction bound in a receptor-binding assay versus visual acuity letter score in a clinical trial).
The modeling used in the present simulation made certain assumptions. When multiple dose-response curve transformations were included in the simulation, the variability added to a given curve parameter was the same for each experimental level. In reality, we are unlikely see an exact propagation of the same variance for an experimental parameter. We may in fact expect the variability to increase for a given parameter as the study progresses to increasingly more complex biological systems. This study was designed using a sequence of sigmoidal dose-response curves as an abstraction for the steps involved in the development of a new therapy. The choice of a dose-response curve has a number of implications for our results. For one, what we measure as a response outcome may change as we move between experimental systems, which was not accounted for in our simulation. Our conclusions are also limited to studies dealing with data that follows the feature of a dose-response transformation. The dose-response curve has the features of asymptotes to a maximum, a minimum, a half-maximal response, and a slope. Other curves grow exponentially with discrete intervals of time, as in a typical exponential curve, or with multiple phases and rates of growth, as in biological growth curves. We can in future investigate whether similar effects are seen using other relationships that exist between experimental data.
The simulation also assumed normal distributions for all datasets. Given that many biological functions do not have a normal distribution, this assumption could result in under-or overestimation of the spread of variance. The selection of an appropriate test for a given dataset is of critical importance for all research 14 . Monte Carlo simulations, as used by our simulation, are particularly useful when analyzing data that do not conform to the assumptions required for a given statistical test 15  www.nature.com/scientificreports/ rejecting a false null hypothesis and power estimates, for instance, can be appropriately assessed using Monte Carlo methods 15 . In summary, the Princess and the Pea problem is a challenge inherent to translational research, where an effect seen early in the development process becomes difficult to detect at the clinical trial level due to a gradual accumulation of variance with each experimental step. Consequently, large sample sizes are needed to detect an effect that may have appeared promising early in the development process. This study provides a quantitative rationale for the frequent failure of translational research, namely the fairy tale-like assumption that an effect size is equally detectable at all levels of translation, despite the spread of variability. The failure to account for the increase in variability during the translational process can lead to an underestimation of sample sizes for later clinical studies. The simulations in this study can be used as tools to better account for the spread of experimental variability when making power and sample size calculations and, ultimately, to design translational studies that are both feasible and successful.

Methods
Monte Carlo simulation. Monte Carlo simulations were used to quantify the effects of variability on individual experiments and on a series of consecutive experiments. For this study we selected a sigmoidal dose-response curve as the basis of the simulated experiments based on its relevance to a variety of biomedical processes. These simulations were designed to better account for the propagation of variability in translational research, which is not typically considered in standard power and sample size calculations.
Functions to simulate power and sample size calculations were written in MATLAB (MathWorks, Natick, MA), based on an unpaired equal-variance t-test ("simulation n"). The MATLAB randn function was used to generate randomly generated numbers from a normal distribution and two distributions were generated (Group A, Group B) with a defined difference between group means (mu0; mu0 = 0.5 for all simulations where mu0 was held constant). The randn function is robust and is based on the Mersenne twister algorithm, with a period of 2 19937 − 1. The ratio of significant to non-significant t-statistics for 10,000 runs of the simulation with various sample sizes determined the power for the simulation (see Supplementary Material).
To confirm that in the absence of variability our simulation replicated standard functions, we compared the sample size requirements for a given power using our simulation to the standalone MATLAB sampsizepwr function. After confirming that our simulation produced reliable sample size estimates, a dose-response relationship was introduced into the model. The simulation was designed to include a maximum of four dose-response transformations, or levels (Fig. 1). The Level 0 simulation replicates a typical unpaired equal-variance t-test with a given difference between the two experimental group means, i.e. no transformation of the data. The Level 1 simulation transforms drug doses for both groups by a dose-response function and again runs a t-test. Level 2 uses the response output values from Level 1 as the input for a second dose-response transformation of both groups. Level 3, similarly, uses the response output from Level 2 and transforms these data again by the same dose-response function. Finally, the Level 4 simulation uses the Level 3 response output as the dose input for another dose-response transformation. In other words, each level reflects progressively greater numbers of nested dose-response curves, using the output of one level as input to the next, thus modeling a series of biological steps in the action of a drug at biochemical, biological, and clinical levels.
The model was designed to allow for variability in four dose-response curve input parameters: the half maximal effective concentration (EC 50 ), the slope of the dose-response curve (slope), the maximal response (MaxResp), and the minimal response (MinResp).
The power and sample size calculations in this study are based on the ability of the simulation to detect a clinical effect in the simulated experiment(s). The study was designed to simulate up to a maximum of 10,000 trials, generating a t-statistic for each of the 10,000 runs, as mentioned above. If a single run was significant, based on an alpha of 0.05, the trial was coded as a 1. Conversely, if the t-statistic for the simulated trial was nonsignificant, the run was coded as a 0. The ratio of significant (1) to non-significant (0) trials then determined the power of the simulation to detect an effect for the parameters and variability of a given hypothetical clinical trial.

Source of model parameters.
In order to populate numbers for the main parameters of the simulation, namely EC 50 , slope, maximal response, minimal response, group A drug dose, and group B drug dose a vector of random inputs was generated for each. For example, the input for group B drug dose was generated as follows: where b = vector for adding variability to the drug dose input for group B, σ = standard deviation (SD) of drug dose, randn(n, 1) = n-by-1 column vector of normally distributed random numbers with mean 0 and variance 1, μ = arithmetic mean of group A drug dose, Δ = difference between group means.
Variability was added to the dose-response curve parameters using a similar function as the one above, with a modifiable variability parameter. For example, the slope for the dose-response curve was modeled as follows: where simSlope = slope parameter that will be used for the dose-response function, Slope = slope parameter before adding variability, randn(n, 1) = n-by-1 column vector of normally distributed random numbers with mean 0 and variance 1, varSlope = amount of variability to add to the slope.