Main

Investigators traditionally use randomized trials, or experiments, and corresponding analysis to make causal inferences about the effects of interventions, assuming independence between an individual’s outcome and treatment assignment and other individuals’ outcomes in the study. In aging research, however, this assumption of independence is not always valid. Examples of interdependency include interference1, group composition effects2 and clusters and nesting3. These issues require attention because they may violate the assumptions of causal inference and of independence made when using traditional hypothesis tests. These terms and others are often not defined uniformly, however, which can lead to confusion. For the purpose of this report, we have defined a set of terms in Box 1.

Interdependency has begun to be addressed in the scientific literature4,5,6,7 but has received little attention in aging research. Yet, the interdependence of subjects within subject-clusters can be observed in the designs and analyses of aging studies. Although the field acknowledges that it is difficult to disentangle how the nine recognized hallmarks of aging are connected8, these undoubtedly impact one another and may themselves be sources of or characterized by interdependency.

These study design challenges underscore the importance of the National Institute on Aging’s effort to ‘develop innovative changes in the design, planning and implementation of clinical trials’9. Indeed, aging research requires researchers to address interdependency through proper study design, analysis and interpretation (Table 1 and Fig. 1). In this Perspective, we highlight the use and importance of randomization and summarize examples of interdependency and related methodologic issues to call attention to interference, clustering and independence and significance levels in aging research (Box 2).

Table 1 An overview of the discussed six study designs
Fig. 1: A visual representation of the study designs.
figure 1

a, Group composition experiments. b, Single-stage randomized trials. c, cRCTs. d, Pseudo-cluster-randomized trials. e, IRGT. f, Two-stage randomized designs.

Examples of interdependency in aging research

Statistical interdependence in animal models

A ubiquitous issue in experimental paradigms using the three main animal models in aging research—Caenorhabditis elegans (hereafter ‘worms’), Drosophila melanogaster (hereafter ‘flies’) and mice—is housing animals in multiple separate enclosures but combining results as if the animals formed a single population. In worms, survival studies generally combine data from subpopulations maintained on multiple agar plates or multiple wells for liquid culture. For instance, the C. elegans Interventions Testing Program, which has extensively explored the replicability of lifespan studies among laboratories10, uses at least three agar plates each containing 35 to 40 animals to complete a single survival assay. Other studies use as few as 20 to 30 individuals per plate and combine the results of several plates11,12. Surprisingly, the number of plates or vials involved in survival analysis is often not specified. In any case, individual plates have a separate history and microenvironment, varying density over time as animals die, and possibly different personnel transferring animals to fresh plates. The important impact of precise transfer technique on longevity has been established by the C. elegans Interventions Testing Program.

Similarly, fly researchers use a wide variety of housing conditions (for example, cages, bottles and vials) but most typically combine survival results from 5 to 10 vials each containing 20 to 30 flies13 nearly always separated by sex, because mixed-sex housing is known to shorten the lives of both sexes14,15. Some studies use substantially larger samples and cages, for instance 125 flies in 3 to 5 replicates, but typically combine replicates for the demographic analyses16. As with worms, each fly vial will have its individual history and microenvironment and possibly different personnel transferring flies to fresh enclosures periodically.

Mouse studies, in which the phenotype of individuals is more easily studied than in worms or flies, typically house four mice or fewer per cage with sexes separated at the beginning of survival experiments, although some research suggests that short-term health is not compromised by higher densities17. Male mice are often from the same litter to minimize fighting, but fighting among males is a recurring issue, resulting in individual males, or even whole cages, being removed from studies18. The number of animals housed in a cage alters thermal and social environments, affecting organ weight, heart rate and multiple aspects of behavior, including food consumption and torpor (particularly important because torpor may be associated with the longevity benefit of food restriction)19,20. Nearly all animal facilities are maintained at temperatures markedly below rodent thermoneutrality21. Group-housed animals somewhat compensate for this by huddling. The impact of the thermal environment can easily be seen when mice or rats are housed singly. In one study, singly housed mice ate 40% more than mice housed in groups of four while maintaining similar body weights22. The thermal environment also affects body composition, the ratio of brown-to-white fat23, activity, and, over time, pathology24,25. Group-housed mice also display greater phenotypic variability than singly housed mice26. For aging studies, these issues are particularly germane because density will change over time as animals begin to die.

Human trials and group effects

Groups exert substantial influence on the behaviors and outcomes of individuals. A classic example of group effects is the Asch conformity experiments, which demonstrated individuals have a tendency to ‘conform’ to an erroneous group consensus27, and have been studied for differential patterns with aging. Specifically, older people demonstrate lower rates of social conformity compared with younger individuals28. Another example is of socially induced stress, which can negatively affect longevity in various social species, including humans2,29,30. Despite the intuitive influence of group effects, the rigorous identification of group effects per se, also called peer effects or contagion effects, is difficult31,32. This is particularly relevant in aging-related research involving older persons in congregate settings. Such circumstances by their nature tend to involve interdependency and examples of studies involving cluster-randomized trials33,34, pseudo-cluster randomization35,36, group composition designs37 and individually randomized but group-delivered trials38,39 exist. For example, herd immunity can affect the analysis of vaccine efficacy40, as discussed in ‘Cluster-randomized controlled trials’. The ACTonHEART intervention41 is another example of potential group effects. In that study, individuals (not clusters) were randomly assigned but received the intervention in group-therapy sessions (that is, post-randomization clustering)41. Another more subtle example of a potential group effect occurs when individuals share an interventionist. For example, the Dutch Geriatric Intermediate Care Program was designed to assess the effect of home visits by geriatric nurses on the function of older adults compared with usual care. Older adults shared their general practitioners. Thus, the general practitioner’s exposure to those in the intervention group could affect the care provided to the usual care group. In trials in which a treatment is administered in a group setting or a single interventionist administers a study intervention to multiple participants, we can observe both interference and within-group correlation of outcomes because of group composition effects.

In observational studies of contagion effects, the challenges are compounded because of homophily and shared environment42. Confounding due to homophily occurs when the same factor that influences an individual’s outcome of interest also influences that individual’s propensity to form ties (and the strength and duration of ties) with others characterized by the exposure of interest. Environmental confounding occurs when an individual and a group share an environmental factor associated with the outcome of interest. In either homophily or environmental confounding, it is difficult to disentangle the causal effect of one’s peers from shared peer characteristics and environmental characteristics shared with one’s peers. One area in which this may occur is when studying centenarians, who are often studied for insight into long, healthy lives. If a study design focuses on identifying ‘longevity genes’ within certain families, for example, issues of interdependence associated with shared environments are raised43,44. Other issues associated with exceptional longevity are age-cohort effects, for instance, among those born before, during or after major environmental or political events (for example, war or pandemic)45,46.

Group formation experiments, in which individuals are randomly assigned to groups of varying compositions and an outcome of interest is observed, can overcome some of the limitations inherent to observational studies47. The goal of randomized group formation experiments is to isolate the causal effect of a group characteristic on individual outcomes. However, the random assignment of individuals to groups does not resolve the problem of confounding due to shared environments48. Nor does random assignment to a peer group guarantee the random formation of network ties. Given the challenges of isolating peer effects on individual outcomes, statistical methods for the estimation of peer effects—both in randomized and nonrandomized designs—is an active area of development and discussion. A common method for estimating peer effects is the linear-in-means model, in which the outcome of interest is regressed on an individual’s characteristics and the average peer outcomes and characteristics47,49. Sacerdote50 provides a thorough review of a peer-effects linear-in-means model, including its limitations, and other approaches to estimate and identify peer effects in group composition experiments.

Real-world constraints and recommendations

Trial recruitment in naturalistic settings is subject to the challenges described throughout this Perspective. This is especially true in pragmatic trials with human participants. A well-known difficulty in clinical trials involves whether people comply with their assigned treatment or remain in the study until its completion. If the person does not comply or leaves the trial, the study contains missing data, and much has been written on this issue51. For instance, trials of technology interventions suffer from systematic and cumulative nonadherence and attrition in the treatment arm52, a phenomenon that may be more common in subgroups affected by a ‘digital divide’, such as rural participants53 and older adults54. In these trials, nonadherence, differential attrition or missing data, unintended exposure to multiple treatments, and other practical realities occur probabilistically but not inevitably; certain study designs and best practices can reduce the risk and consequence of these effects.

The intention-to-treat effect can still be estimated to evaluate the effect of being randomized to a given condition even if participants do not complete the study55. While sometimes criticized, the intention-to-treat analysis serves a valuable purpose from a public health perspective: the effect of random assignment on the population. In this way, investigators can assess whether use of a guideline, policy or other intervention has a significant effect versus not implementing the (or implementing a different) guideline, policy or intervention. Although effectiveness from the public health perspective does not properly estimate efficacy, or even effectiveness from the patient perspective, it does inform policy, public health and clinical decision-making, which are particularly important in aging research.

A related but different issue is assessing the utility of using a pragmatic design for a given research question. The answer to this question relates to, in large part, whether the intervention dose is sufficiently different in the intervention arm versus control arm. For instance, if the pragmatic study is assessing whether care facilitated by physician alerts affects health, the physician alerts must reach a sufficiently larger percentage of participants in the intervention arm to even assess the intervention effect. Otherwise, results are likely to be nonsignificant even if the intervention itself is effective. Further, overlap between the arms may be greatly affected by the experimental unit and other interdependencies. By contrast, a pragmatic trial may be necessary when the results of a traditional randomized controlled trial (RCT) are not generalizable. For instance, if persons of lower socioeconomic status are highly underrepresented in the trial, that sampling procedure will greatly affect the utility of the findings.

Although nonadherence, differential attrition or missing data, unintended exposure to multiple treatments, and other practical realities frequently occur, they are not inevitable. Careful planning, proper study designs and best practices can reduce the risk and consequence of these occurrences. Research teams can perform a risk assessment of any potential threats to valid inference at the outset of the study and have clear and detailed protocols in place to mitigate anticipated challenges. When unforeseen issues arise, resultant contamination, nesting and other interdependencies can often be measured and accounted for in analysis. If nothing else, deviations from protocol should be documented clearly to allow for accurate and transparent reporting.

Available study designs

Single-stage individually randomized trials

In a single-stage individually randomized trial, a control group is expected, in probability, to be identical to the intervention group at baseline. That is, the average attributes of the two groups are assumed to be the same. Therefore, statistically significant differences in the outcome can be attributed to the intervention. When baseline covariates are suspected to influence outcomes in a systematic way (for example, participant age in a survival analysis, disease severity, offspring of animal models being measured from successive progeny (for example, F1, F2, F3 and F4) versus from different parity56), covariate considerations and adjustments may be useful at the design (for example, stratified randomization57) and analysis (for example, randomization-based58 and model-based analysis59) stages, respectively.

In parallel-group efficacy RCTs, the power to detect statistical interactions between treatment and baseline strata is often low compared with the power to evaluate an average treatment effect. For example, the lack of evidence for treatment efficacy among women and men based on separate analyses does not address the question of whether treatment differences vary depending on sex60. Moreover, multiple subgroup analyses involving baseline strata like age or disease stage or multiplicity involving analysis of several endpoints can increase type 1 error rates. Conversely, correction for such errors (that is, multiple comparisons adjustment or multiplicity adjustment) may increase type 2 error rates. Thus, tests of exploratory or confirmatory interaction hypotheses should precede within-subgroup analysis.

In existing aging-related trials, most intention-to-treat analyses rely exclusively on comparison of baseline treatment assignment to determine treatment effectiveness and ignore potential time-varying covariate issues61,62. But time-varying covariates, in other words, prognostic factors that change, can result in changes in the treatment or intervention over time, which in turn affect treatment efficacy measures. Identifying potential time-varying covariates is important to understand the causal effects of investigated treatments or interventions63.

Potential time-varying moderators must also be considered64,65. These include factors that may change over time (including measuring the outcome66) and modify the treatment effect on outcomes of interest, including breeding strategies or ‘cohort’ effects. Additional factors that may change over time include secondary mutations resulting from genetic drift.

Cluster-randomized controlled trials

A cluster-randomized controlled trial (cRCT) is a trial in which the randomization units are clusters or groups of individuals (for example, clinics, hospitals, classes and families) instead of individuals themselves, although outcomes are measured at the individual level. In this case, the outcomes are likely to be correlated within the cluster and are not independent observations as is the assumption of standard statistical analyses such as t-tests, analysis of variance or regression as typically used.

There are two important issues with this design: clustering and nesting. Clustering means that individuals are grouped together (for example, patients within a clinic or mice within a litter). Nesting means that clusters or groups are situated within a treatment regimen such that all individuals in the same cluster receive the same treatment. For example, in the study by List et al.67, mice were clustered within the cage, and cages were nested within the treatment because all mice in the same cage received the same diet. Clustering is measured by the intraclass correlation (ICC), which describes the amount of the variation of the data explained by the unit of randomization (that is, the cluster)68, meaning the correlation within clusters relative to the correlation between clusters. Ignoring clustering and nesting during analyses can lead to an inflated type I error rate3,69,70,71,72. There are additional issues, such as census recruitment or enrolling via cluster random sampling, a two-stage process in which the population is divided into clusters and a subset of the clusters is randomly selected, as opposed to investigator-led selection of clusters, which can be argued to induce bias and we refer the reader elsewhere for detailed discussions73,74,75.

Additionally, because clusters are the independent unit of analyses, the analysis needs to account for the number of clusters, the ICC, and the number of individuals per cluster. When the number of clusters is small, and the coefficient of variation is even moderately large76, statistical power to detect treatment effects will be limited regardless of the sample size within clusters70,71. It is important to correctly specify the degrees of freedom according to the independent units of randomization.

Even when clustering is carefully considered, individuals in the same cluster may interfere with each other, such that the estimated (direct) effect may be biased (we use the word ‘bias’ several times; whether a procedure is biased depends in part on the estimand77). For example, when a cRCT is used to estimate a vaccine’s effect (where clusters are assigned to vaccine or placebo), vaccine efficacy tends to be overestimated when using a typical approach for analyzing cRCT data. This occurs because the estimated vaccine efficacy reflects the vaccine’s direct and indirect effects, and those two effects cannot be distinguished by comparing vaccinated and unvaccinated individuals. Indirect effects appear as the result of herd immunity, where individuals in the vaccinated group are exposed to fewer pathogens because others in the community are also vaccinated40. Thus, the magnitude of exposure to a pathogen is correlated within clusters. To identify an effective vaccine, such overestimation may erroneously appear to be beneficial due to the high power. A simulation study demonstrated that disease contagiousness creates a high ICC; thus, any perceived benefit of overestimating the vaccine efficacy in power is diminished78. Ultimately, when performing and analyzing a cRCT it is important to collect and analyze the data with a study design and statistical model that accounts for both the ICC (to adjust the denominator degrees of freedom to account for the independent unit of analyses) and the problem of interference. Information on how to analyze this design68,69,71,79; guidelines to follow when describing, analyzing and performing a cRCT70; and information to help guide the editorial and peer review process when reviewing cRCTs80 can be found in the cited literature.

Pseudo-cluster-randomized trials

As described above, in some studies an individual’s initially random treatment assignment may be influenced by the treatment status of other units within a cluster, resulting in a possibly inflated type I error rate. One approach to avoid such contamination (that is, spillover effects) is a cRCT. However, when cRCTs are not possible, or may introduce bias, pseudo-cluster randomization can be considered. Pseudo-cluster randomization is a compromise between cRCT and individual randomization and may be used when there is risk for contamination with randomizing individuals and concern regarding selection bias with randomizing clusters81.

Pseudo-cluster randomization is a specific type of two-stage randomization82 (detailed later in the paper), in which clusters are first randomized to groups labeled H (intervention majority) and L (control majority; more than two groups could be used). In the second step, a fraction f (0.5 ≤ f ≤ 1) of the individuals within H clusters are randomly assigned to treatment and the rest to control. In L clusters, the same fraction f of individuals in each cluster are randomized to control and the rest to treatment82. Compared with cluster randomization, selection bias is less likely to arise in pseudo-cluster randomization because the study personnel do not know to which type of cluster (that is, H or L) individuals have been assigned nor do they know (as opposed to cluster randomization) to which treatment a participant will be assigned. However, predictability of treatment assignment would still be an issue with pseudo-cluster-randomized designs. Study personnel might be able to guess the treatment assignments over time with increasing precision, which reintroduces the risk for selection bias. Smaller f fractions will result in lower predictability35.

Reducing contamination in pseudo-cluster randomization (as opposed to individual randomization) is predicated on two underlying assumptions. First, limiting cross-exposure to the other condition reduces contamination. The closer f is to 1, the less the majority condition in each cluster is contaminated by the minority condition. Second, contamination of the majority condition by the minority condition in the same cluster is smaller than vice versa. Whether these assumptions hold depends on the cluster size and the nature of the intervention.

An indirect approach to assessing the extent of contamination in a pseudo-cluster-randomized design is to compare the treatment effect among minority control, majority control, and intervention individuals (minority and majority inclusive). The assumption is that if contamination is small, the treatment effect would be similar in the minority control and the majority control, and substantially smaller in both control groups compared with the intervention group83. While pseudo-cluster randomization is tagged as a design to reduce contamination, selection bias and recruitment issues of individual and cluster randomizations, there is not a feasible approach to quantify the reduction of contamination by this design compared with individual and cluster randomizations.

Individually randomized group treatment

In individually randomized group treatment (IRGT) trials, individuals are randomly assigned to study conditions. However, unlike in single-stage individually randomized trials, individuals in IRGT trials receive whole or part of their intervention in a group setting. IRGT trials are also in contrast to group randomized trials, which randomly assign clusters and not individuals to study conditions. IRGT trials could involve at least one of the following: (1) individuals in one arm only (typically the intervention) receive treatment in a group setting; (2) individuals in all study arms are administered treatment in a group setting; (3) part of the intervention is administered in a group format; and (4) the intervention is provided by a common interventionist. IRGT trials in which participants in one arm are administered a group intervention are also referred to as partially clustered or partially nested designs84,85. These situations often occur in studies with behavioral components such as exercise or weight loss interventions, which may be delivered in group settings38. For example, the ‘Calorie Restriction in Overweight SeniorS: Response of Older Adults to a Dieting’ (CROSSROADS) trial used a prospective randomized controlled design to compare the effects of changes in diet composition alone or combined with weight loss with an exercise-only control intervention on body composition and adipose tissue deposition in older adults38. The trial included three arms that met weekly for the first 24 weeks of the intervention, then every 2 weeks for the remainder of the 12-month intervention. The study protocol included 30 min of group discussion related to a dietary, exercise or behavioral topic, followed by 30 min of supervised exercise using prescribed resistance-band exercises. Similarly, the ‘Comprehensive Assessment of Long-Term Effects of Reducing Intake of Energy’ (CALERIE) trial studied the effects of 2 years of calorie restriction on biomarkers of longevity among people who are not obese86. Part of the CALERIE intervention included group sessions to help the participants to adhere to 25% calorie restrictions. These trials further demonstrate group dynamics.

Similar to cRCTs, IRGT trials also have nonindependence in observations that need to be accounted for during design, analysis and interpretation. Less attention has, however, been paid to the unique design and related analytic methods needed for IRGT trials. Correlations (indexed by the ICC coefficient) may develop over time in IRGT trials as group members share the treatment environment, violating the assumption that model residuals are independent within conditions. Regarding design, there is a need to account for the cluster effect. Variance inflation factors based on estimates of ICC are an important part of sample size estimation that require sample sizes to be increased compared with individual RCTs. Not accounting for this would lead to an underpowered trial. Estimating the variance inflation factor is further complicated compared with cRCTs because each arm or condition may have a different ICC coefficient. Further, the design may not have the same hierarchical structure in all conditions, which would imply a heterogeneous variance-covariance structure, allowing for ICC in the intervention condition but not in the control condition. Regarding analyses, standard linear regression assuming independence would lead to inflated type I error rates. This may prompt researchers to overestimate the significance of their findings, or to deem interventions inappropriate because they were found effective only because of statistical artifacts.

Solutions to some of these concerns can be gleaned from a simulation study. In 2018, Candlish and colleagues compared the following techniques to assess the bias, coverage and type I error: a standard linear regression model that assumes independence; a fully clustered mixed-effects model with singleton clusters (that is, clusters containing one individual) in the control arm; a fully clustered mixed-effects model with one large cluster in the control arm; a fully clustered mixed-effects model with pseudo-clusters in the control arm; a partially nested homoscedastic mixed-effects model; and a partially nested heteroscedastic mixed-effects model85. The simulation study found that ignoring even small ICCs results in inflated type I error rates and over-coverage of confidence intervals85. Accounting for heteroscedasticity in mixed-effects models allowed for appropriate control of type I error rates and unbiased ICC estimates and maintained the statistical efficiency in terms of power. Wider adoption of these analytic approaches is necessary, and the simulation article provides code to implement these different variations of mixed-effect models85. Aging-related trials such as CALERIE and CROSSROADS should in future be analyzed using mixed-effect models that account for heteroscedasticity. IRGT trials may also present scenarios where a treatment is administered to participants through multiple groups. We refer readers to simulation studies with recommendations87. Finally, consider presenting estimates of ICC when using IRGT trials. This would help in sample size determination and design of future trials and with the interpretation of intervention group effects.

Two-stage randomized design

The assumption that one study participant’s treatment assignment has no effect on another study participant breaks down in settings where study participants cannot be isolated. It is almost impossible to limit the effect of an intervention (for example, vaccines in aging populations or assisted-living interventions to reduce falls) on other group members (see additional examples in ref. 88). Interference can result in a severe understatement of treatment impacts if it is ignored. In some settings, two-stage randomized designs can address and estimate interference.

When interference is likely, two-stage randomized designs can estimate not only the average direct causal effects, but the average indirect effects (that is, interference effects), total causal effects and overall causal effects under certain assumptions. In a two-stage nested randomized design, these effects can be isolated when groups (community) are first randomized to treatments, and then at the second stage, units in the group (family) are randomly assigned at varying probabilities to the treatment levels6,88,89.

For example, Halloran and Hudgens88 consider a vaccine efficacy study whereby geographically separate groups (residential areas/clusters) are randomized to two assignment regimens (vaccine coverage). In one group, 30% of individuals are randomly assigned to receive a vaccine, and in the other, more than 50% of individuals are assigned to receive a vaccine6,90. The random assignment of residential clusters to vaccine coverage represents the first stage of the two-stage randomization (for example, A or B). The second stage is done by randomly selecting who will get the vaccine in varying probabilities within the assignments at the first stage (for example, 30% of individuals are assigned to receive the vaccine in A, and 50% of individuals are assigned to receive vaccine in B). This design permits estimation of both the direct causal effect of the vaccine program (difference in disease incidence between vaccinated and unvaccinated) and, because vaccine coverage is not equal in A and B, the indirect effect of the vaccine in reducing the community spread of the infectious agent to unvaccinated individuals. The example illustrates that the vaccination effect would be underestimated when only direct effects could be estimated (that is, if all participants were randomly assigned at 50% probability). The estimation of effects from this design requires the assumptions of mixed assignment being used at each randomization stage, and stratified interference (for example, an individual’s outcome from an intervention within a geriatric rehabilitation unit will be the same regardless of which other individuals receive the intervention6).

There are some considerations to implementing two-staged randomization under various scenarios and work is actively ongoing to address them. One such scenario is when the sizes of the randomized groups differ. In this case, the causal estimands proposed in Halloran and Hudgens may be biased. To overcome this issue, Basse and Feller proposed additional estimators for unequal group sizes91. In their example, the second stage of randomization assigns only within those units assigned to ‘treatment’ in the first stage; those in the control group are not randomized again. Also, the assumption of partial interference or no interference across groups holds if the groups are separated enough in both time and space. This may not occur if they share a geographical location, for example, resulting in an added complexity for the estimation of interference effect. This topic is an active area of methodologic research with potentially vast application in the analysis of complex aging research data. For more about these methodologic developments, we direct the reader to Tchetgen et al.1.

A different form of staged randomization similarly provides utility under conditions that carry expectation effects. Whereas traditional RCTs isolate the effect of treatment assignment, under ‘real-world’ conditions, expectations may modify the total effect. For instance, although participants can be masked to drug assignment in a trial, their prescription of the drug by a physician is not, and the expectation of knowing that a participant is not receiving a placebo may add to or subtract from outcomes. To estimate the effect of treatment assignment under ‘actual conditions of use’ without the use of deception, George et al. proposed ‘randomization 2 randomization probabilities’, whereby study participants are first randomized to a probability between 0 and 1 from a distribution defined on the unit interval92. Then, the participants are told their probability of being assigned a treatment (but not the actual assignment), and therefore their expectations of receiving the treatment are manipulated. To estimate expectation effects, terms are included in the statistical model for treatment assignment and probability and randomization probability-by-treatment interaction. This design is limited to treatments that can be masked from participants and entails a reduction in statistical power that needs to be considered in sample size planning.

Conclusions

Our purpose was to bring attention to the presence of interdependency in aging research studies and to present possible strategies for addressing such interdependency. Research requires tradeoffs between laboratory, clinical and real-world conditions and an understanding of ecologically valid experiments relative to the laboratory. If interdependency is suspected, investigators should account for it in the analytic model and provide proper reporting. Single-stage randomization is not always the most appropriate design, so other possible design strategies can be considered, including cRCTs (analyze as randomized), pseudo-cluster-randomized studies (enroll enough clusters guided by proper power analyses), or two-stage randomization. In addition, investigators should consider reporting ICCs for any clusters (for example, agar plates, vials, cages and housing facilities). It is easy to overlook the intersection of these issues in the clinical setting, especially because addressing them can be so challenging in a real-world setting. Every research question requires an appropriate research design; thus, interdependency does not have a single solution and may itself be the topic of interest.