For studies with hierarchical noise sources, use a nested analysis of variance approach.
Many studies are affected by random-noise sources that naturally fall into a hierarchy, such as the biological variation among animals, tissues and cells, or technical variation such as measurement error. With a nested approach, the variation introduced at each hierarchy layer is assessed relative to the layer below it. We can use the relative noise contribution of each layer to optimally allocate experimental resources using nested analysis of variance (ANOVA), which generally addresses replication and blocking, previously discussed ad hoc1,2.
Recall that factors are independent variables whose values we control and wish to study3 and which have systematic effects on the response. Noise limits our ability to detect effects, but known noise sources (e.g., cell culture) can be mitigated if used as blocking factors2. We can model the contribution of each blocking factor to the overall variability, isolate it and increase power2. Statisticians distinguish between fixed factors, typically treatments, and random factors, such as blocks.
The impact of fixed and random factors in the presence of experimental error is shown in Figure 1. For a fixed factor (Fig. 1a), each of its levels (for example, a specific drug) has the same effect in all experiments and an unmodeled uncertainty due to experimental error. The levels of a fixed factor can be exactly duplicated (level A1 in Fig. 1a is identical for each experiment) and are of specific interest, usually the effect on the population mean.
In contrast, when we repeat an experiment, the levels of a random factor are sampled from a population of all possible levels of the factor (replicates) and are different across all the experiments, emphasized by unique level labels (B1–B9; Fig. 1b). Because the levels cannot be exactly duplicated, their effect is random and they are not of specific interest. Instead, we use the sample of levels to model the uncertainty added by the random factor (for example, all mice).
Fixed and random factors may be crossed or nested (Fig. 2). When crossed, all combinations of factors are used to study the main effects and interactions of two or more factors (Fig. 2a). In contrast, nested designs apply a hierarchy—some level combinations are not studied because the levels cannot be duplicated or reused (Fig. 2b). Random factors (for example, mouse and cell) are nested within the fixed factor (drug) to measure noise due to individual mice and cells and to generalize the effects of the fixed factor on all mice and cells. If mice can be reused, we can cross them with the drug and use them as a random blocking factor2 (Fig. 2c).
We will use the design in Figure 2b to illustrate the analysis of nested fixed and random factors using nested ANOVA, similar to the ANOVA discussed previously2. Now nesting is taken into account and the calculations have different interpretations because some of the factors are random. The fixed factor may have an effect on the mean, and the two random factors will add uncertainty. We will be able to estimate the amount of variance for each random factor and use it to better plan our replication strategy. We can maximize power (for example, within cost constraints) to detect a difference in means due to the top-level fixed factor or to detect variability due to random factors. The latter is biologically interesting when increased variance in cell response may be due to increased heterogeneity in the genotypes and implicated in drug resistance.
We will simulate the nested design in Figure 2b using three factors: A (a = 2 levels: control and treatment), B (mice, b = 5 levels, σB2 = 1), C (cells, c = 5 levels, σC2 = 2). Expression for each cell will be measured using three technical replicates (σε2 = 0.5, n = 3). The raw sample data of the simulation are shown in Figure 3a.
Nested ANOVA calculations begin with the sum of squared deviations (SS) to partition the variance among the factors, exactly as in regular ANOVA. For example, the first blue arrow in Figure 3a represents the difference between the averages of all points from mouse B4 (X14..) and all points from the control (X1...). Factor C has the largest deviations (Fig. 3b) because it was modeled to be the largest source of noise (σC2 = 2). The distinction between regular and nested ANOVA is how the mean squares (MS) enter into the calculation of the F-ratio for each factor. The F-ratio is a ratio of MS values, and the denominator corresponds to the MS of the next nested factor (for example, MSB/MSC) and not MSE (see Supplementary Table 1 for nested ANOVA formulas and calculated values; see Supplementary Table 2 for expected values of MS). The Ft-est uses the ratio of between-group sample variance (estimate of population variance from sample means) and within-group variance (estimate of population variance from sample variances) to test whether group means differ (for fixed factors). In the case of random factors, the interpretation is whether the factor contributes noise in addition to the noise due to the factor nested within it (for example, is there more mouse-to-mouse variability than would be expected from cell-to-cell variability?).
At the bottom of the nested hierarchy (n = 3 technical replicates per cell), we find MSE = 0.55, which is an estimate of σε2 = 0.5 in our simulation. We find statistically significant (at α = 0.05) contributions to noise from both mice (factor B) and cells (factor C) with estimated variance contributions of 0.84 and 2.1, respectively, which matches our inputs σB2 = 1 and σC2 = 2. Because the top-layer factor is fixed and not considered a source of noise, its variance component is not a useful quantity—of interest is its effect on the mean. Unfortunately, we were unable to detect a difference in means for A (P = 0.25) because of poor power due to our allocation of replicates. It is useful to relate the F-test for factor A to a two-sample t-test to understand the statistical quantities involved and calculate power.
The F-test for the top-layer factor A (F = MSA/MSB) tests the difference between the variances of treatment and mouse means. Any treatment effect on the mean will show up as additional variance, which we stand a chance to detect. Because we have only two levels of factor A, the F-test, which has degrees of freedom (d.f.) of a – 1 = 1 and a(b – 1) = 8, is equivalent to the two-sample t-test for samples of size b, 2(b – 1) d.f. and with t = √F. This t-test is applied to the control and treatment samples formed using b = 5 averages Xij.. (Fig. 3a) whose expected variance is E[Var(Xij..)] = σB2 + σC2/c + σε2/(cn) = 1.43 (ref. 1). This quantity is estimated by MSB/(cn) = 1.28, which is exactly the average variance of the two sample variances 1.73 and 0.83 (Supplementary Table 3). These samples yield the control and treatment means of 10.1 and 11.0 (Xi...; Fig. 3a) and a t-statistic of 0.9/√(2MSB/(bcn)) = 1.24, which yields the same P value of 0.25 as from the F-test.
We can now calculate the t-test power for our scenario. For a difference in means of d = 1, the power using samples of size b = 5 is 0.21, using the expected variance 1.43. In practice, we might run a trial experiment to determine this value using MSB/(cn). Clearly, our initial choice of b, c and n was an inadequate design—we should aim for a power of at least 0.8. If variance is kept at 1.43 (c = 5, n = 5), this power can be achieved for a sample size b = 24. With 24 mice, the expected variance of the average across mice would be E[Var(Xi...)] = 1.43/24. Dividing this into the total variance due to replication (σB2 + σC2 + se2 = 3.5), we can calculate the effective sample size, 57 (ref. 1). As we've previously seen, this can be achieved with the fewest number of measurements if we have b = 57 mice and c = n = 1. If we assume the cost of mice, cells and technical replicates to be 100, 10 and 1, respectively, these designs would cost 3,960 (b = 24, c = 5, n = 3) and 6,327 (b = 57, c = 1, n = 1). Let's see if we can use fewer mice and increase replication to obtain the same power at a lower cost.
The nested analysis provides a general framework for these cost and power calculations. The optimum number of replicates at each level can be calculated on the basis of the cost of replication and the variance at the level of the factor. We want to minimize Var(Xi...) = σB2/b + σC2/(bc)+ σε2/(bcn) within the cost constraint K = bCB + bcCD + bcnCD (CX is cost per replicate at factor X) with the goal of finding values of b, c and n that provide the largest decrease in the variance per unit cost. The optimum number of technical replicates is n2 = CC/CD × σε2/σC2. In other words, subreplicates are preferred to replicates when they are cheaper and their factor is a source of greater noise. With the costs as given above (CC/CN = 10) we find n2 = 10 × 0.5/2 = 2.5 and n = 2. We can apply the same equation for the number of cells, c2 = CB/CC × σC2/σB2, where CB is the cost of a mouse. Using the same tenfold cost ratio, c2 = 10 × 2/1 = 20 and c = 5. For c = 5 and n = 2, Var(Xij..) is 1.45, and we would reach a power of 0.8 if we had b = 24 mice. This experiment is slightly cheaper than the one with n = 3 (3,840 vs. 3,960).
Two components affect power in detecting differences in means. Subreplication at the cell and technical layer helps increase power by decreasing the variance of mouse averages, Var(Xij..), used for t-test samples. The number of mice also increases power because it decreases the standard error of Xij.. (the precision of mouse averages) because sample size is increased. To obtain the largest power to detect a treatment effect with the fewest number of measurements, it is always best to pick as many mice as possible: effective sample size is largest and variance of sample averages is lowest.
The number of replicates also affects our ability to detect the noise contribution from each random factor. If detecting and estimating variability in mice and cells is of interest, we should aim to increase the power of the associated F-tests (Supplementary Table 1). For example, under the alternative hypothesis of a nonzero contribution of cells to noise (σC2), the F-statistic will be distributed as a multiple of the null hypothesis F-statistic, Fu,v × (nσC2 + σε2)/σε2. The multiplication factor is the ratio of expected MS values (Supplementary Table 2). For our simulation values, the multiple is 13 and the d.f. are u = 40 and v = 100. The critical F-value is 1.52, and our power is the P value for 1.52/13, which is essentially 1 (this is why the P value for factor C in Fig. 2b is very low). For level B we have u = 8, v = 40, a multiple of 3.3 (21.5/6.5) and a power of 0.72. The power of our design to detect noise within mice and cells was much higher than that for detecting an effect of the treatment on the means.
Nested designs are useful for understanding sources of variability in the hierarchy of the subsamples and can reduce the cost of the experiment when costs vary across the hierarchy. Statistical conclusions can be made only about the layers actually replicated—technical replication cannot replace biological replication for biological inference.
Blainey, P., Krzywinski, M. & Altman, N. Nat. Methods 11, 879–880 (2014).
Krzywinski, M. & Altman, N. Nat. Methods 11, 699–700 (2014).
Krzywinski, M. & Altman, N. Nat. Methods 11, 597–598 (2014).
The authors declare no competing financial interests.
About this article
Cite this article
Krzywinski, M., Altman, N. & Blainey, P. Nested designs. Nat Methods 11, 977–978 (2014). https://doi.org/10.1038/nmeth.3137
This article is cited by
Scientific Reports (2022)
Protozoa populations are ecosystem engineers that shape prokaryotic community structure and function of the rumen microbial ecosystem
The ISME Journal (2022)
Early Life Social Stress Causes Sex- and Region-Dependent Dopaminergic Changes that Are Prevented by Minocycline
Molecular Neurobiology (2022)
Soil Enzyme Activity as Affected by Land-Use, Salinity, and Groundwater Fluctuations in Wetland Soils of the Prairie Pothole Region
Elevated salinity and water table drawdown significantly affect greenhouse gas emissions in soils from contrasting land-use practices in the prairie pothole region