Simultaneous examination of multiple factors at two levels can reveal which have an effect.
Twolevel factorial experiments, in which all combinations of multiple factor levels are used, efficiently estimate factor effects and detect interactions—desirable statistical qualities that can provide deep insight into a system. This gives them an edge over the widely used onefactoratatime experimental approach, which is statistically inefficient and unable to detect interactions because it sequentially varies each factor individually while all the others are held constant.
Suppose that we would like to determine which of three candidate compounds (factors) have an effect on cell differentiation (response) and also estimate their interactions. In this case, two levels for each compound suffice: low (or zero) and high concentration, giving 2^{3} = 8 factorlevel combinations (treatments). The levels for each compound should be as far apart as possible so that the effect size will be as large as possible. However, the common assumption that the response and the factor level are linearly related might not be true when the distance between factor levels is large. Thus, for accuracy, complicated designs may call for levels that are closer together. If in doubt, increase the chance of detecting factor effects by choosing levels that are too far apart, rather than too close.
Let’s name our factors A, B and C, and use –1 and +1 for the low and high levels, respectively (Table 1). Even though there is no replication, this 2^{3} full factorial design can detect factor effects, if some sensible assumptions are made.
A key quantity to estimate is the main effect, which is the average difference in response between the high and low levels of a factor. For example, we compute the main effect for A as –1.2 by taking the average of the responses when A = +1 (–0.063) and subtracting from it the average of the responses when A = –1 (+1.1). Equivalently, one can compute main effect estimates by taking the inner product of their column and the response column, and then dividing that value by n/2, where n is the number of runs. Note that this effect estimate measures the change in response mean when the factor changes by two units (from –1 to +1), whereas a regression parameter estimate measures the change in response when the factor changes by one unit^{1}. For instance, the true regression coefficient in our model for A is –0.5, and the regression parameter estimate is –1.2/2 = –0.6.
The products of the main effect columns yield interaction columns (Table 1), whose effects can be calculated in the same way as the main effects. For example, though the true effect of the AB interaction is 0, its estimate is 0.36, which is the difference between the average response when the constituent factors have the same sign (that is, AB = +1) and the average when their sign is different (AB = –1). All main effects and interactions are uncorrelated with all other effect estimates, which is evident from the fact that their columns in Table 1 are all pairwise orthogonal (the inner product of any pair is zero).
Once the factorial effects have been computed, the natural question is whether they are large enough to be of statistical and scientific interest. A model can be fit using linear regression^{1,2}, but because in a 2^{k} full factorial experiment there are as many runs (2^{k}) as factorial terms (in our 2^{3} example, there are 3 main effects, 3 twofactor interactions, 1 threefactor interaction and the intercept), the fitted values are just the observed data. Thus, if all factorial terms are included in the model, traditional regressionbased inferences cannot be made because there is no estimate of residual error. In a threefactor experiment, this issue can be addressed by replication, but for larger studies this might be infeasible owing to the large number of treatments.
Various methods exist to address inference in factorial experiments. Simple graphical examination (e.g., using a Pareto plot, which shows both absolute and cumulative effect sizes) can provide considerable information about important effects. A more formal method is to model only some of the factorial effects; this approach depends on the reasonable and empirically validated assumptions of effect sparsity and effect hierarchy^{3}. Effect sparsity tells us that in factorial experiments, most of the factorial effects are likely to be unimportant. Effect hierarchy tells us that loworder terms (e.g., main effects) tend to be larger than higherorder terms (interactions). Application of these assumptions yields a reasonable analysis strategy: fit only the main effects and twofactor interactions, and use the degrees of freedom from the unmodeled higherorder interactions to estimate residual error.
We illustrate this by simulating a 2^{6} full factorial design (64 runs) with the model y = 1.5 – 0.5A + 0.15C + 0.65F + 0.2AB – 0.5AF + ε, where ε is the same as in our 2^{3} model (Table 1). Note that we have simulated only three factors (A, C and F) and two interactions (AB and AF) to have an effect. The fit to all factorial effects provides strong visual evidence that F, A, AF and AB are important (Fig. 1a); the effect of C is uncertain, as its magnitude is similar to that of many inert effects.
If we apply the strategy of modeling only the main effects and twofactor interactions, we get 64 – (1 + 6 + 15) = 42 degrees of freedom for error that can be used for inference (Fig. 1b). Obviously, we will not be able to detect any interactions of three or more factors. If these interactions are large, our error estimate will be inflated and our inferences will be conservative. However, on the basis of effect hierarchy, we are willing to assume that these higherorder terms are not important. This model shows that the C regression parameter estimate of 0.11 is significant (P = 0.01), but also incorrectly identifies the BF estimate (0.09) as significant (P = 0.04). Whether considered visually or more formally via regression, the most important effects are identified, with ambiguities for a few smaller estimated effects.
For interpretations of interaction effects—how factors influence the effects of other factors—interaction plots are useful (Fig. 2). For example, the large AF interaction of –0.38 (Fig. 1a,b) tells us that the level of A has an important effect on the effect of F. Given that the regression main effect estimates of A and F are –0.56 and 0.70, respectively, if A = –1, then the estimated change in the mean response for a unit change in F is 0.70 + 0.38 = 1.08, whereas when A = +1, the change due to F is just 0.70 – 0.38 = 0.32.
Full factorial designs grow large as the number of factors increases, but we can use fractional factorial designs to reduce the number of runs required by considering only a fraction of the full factorial runs (e.g., half as many in a 2^{6–1} design). These runs are chosen carefully so that under the reasonable assumptions of effect sparsity and hierarchy, the terms of interest (e.g., main effects and twofactor interactions) can be estimated.
For example, consider runs 2, 3, 5 and 8 in Table 1, which have ABC = +1. If we have only these four runs, we cannot distinguish the intercept from the ABC interaction (they are completely confounded) because they have the same factor levels (their inner products with the response are identical). Within these runs, A is completely confounded with BC, B with AC, and C with AB. Thus, if we found that the A = BC effect was important, we would be unsure of whether this was due to a significant effect of A or of the BC interaction. However, the effecthierarchy principle would suggest that A is probably driving the result rather than BC.
We can apply the same reasoning in a 2^{6} experiment to remove half the runs. In the 32run 2^{6–1} fractional factorial design there are 32 confounding relations (e.g., ABCDEF with the intercept, A with BCDEF, etc.), and, importantly, all of the main effects and twofactor interactions are confounded with four and fivefactor interactions. Given our assumption that these highorder effects are unlikely to be important, we have little worry that they will contaminate our estimate of the main effects and twofactor interactions.
Even if we fit the intercept, all main effects and all 15 twofactor interactions, we’re still left with 32 – 22 = 10 degrees of freedom for inference on these factorial effects (Fig. 2c), similar to the process for the full set of 64 runs (Fig. 2a,b), but with half the number of runs. With further assumptions about the model hierarchy, even smaller fractions of the full factorial experiment can provide useful information about the main effects and some interactions.
Twolevel fractional factorial designs provide efficient experiments to screen a moderate number of factors when many of the factorial effects are assumed to be unimportant (sparsity) and when an effect hierarchy can be assumed. They are simple to design and analyze, while providing information that can be used to inform more detailed followup experiments using only the factors found to be important. More details on full and fractional factorial designs can be found in ref. ^{4}.
Change history
09 April 2019
The initially published paper contained an error in Table 1: in the rightmost column (y), “0.09” should have been “–0.09.” This error has been corrected in the PDF and HTML versions of the article.
References
 1.
Altman, N. & Krzywinski, M. Nat. Methods 12, 999–1000 (2015).
 2.
Krzywinski, M. & Altman, N. Nat. Methods 12, 1103–1104 (2015).
 3.
Li, X., Sudarsanam, N. & Frey, D. Complexity 11, 32–45 (2006).
 4.
Mee, R. A Comprehensive Guide to Factorial Twolevel Experimentation (SpringerVerlag, New York, 2009).
Author information
Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Rights and permissions
About this article
Cite this article
Smucker, B., Krzywinski, M. & Altman, N. Twolevel factorial experiments. Nat Methods 16, 211–212 (2019). https://doi.org/10.1038/s4159201903359
Published:
Issue Date:
Further reading

The standardization fallacy
Nature Methods (2021)

Is prone free breathing better than supine deep inspiration breathhold for left wholebreast radiotherapy? A dosimetric analysis
Strahlentherapie und Onkologie (2021)