Introduction

A wide variety of studies are undertaken with the aim of understanding and improving paediatric health, and in particular to identify the causal processes that lead to the development of health outcomes or disease. To achieve this, it is helpful to define a number of variables (for the benefit of the reader unacquainted with the technical language that follows, we have created a Supplemental Glossary available online). For example, in a study looking at the relationship between screen time (time spent watching television, using computers or games consoles) and childhood obesity,1 the authors hypothesised that more screen time (the exposure) may lead to an increased risk of childhood obesity (the outcome). The exposure may cause the outcome not directly but through an intermediate process—a direct reduction in physical activity (the intermediate variable or mediator). Finally, we want to identify factors that influence both the exposure and the outcome, but are not directly in the causal pathway. An example is low parental education, which is a cause of both increased screen time and an increased risk of obesity/overweight.2,3 In this situation, low parental education acts as what is called a confounder. If not recognised and controlled for, this could lead to a false interpretation of the true relationship between the two variables, for example by falsely attributing obesity solely to increased screen time.

We will not attempt to summarise the history, philosophy and applications of causal inference, but instead in this review focus on the use of a graphical tool, causal directed acyclic graphs (DAGs). DAGs provide a simple way of graphically representing, communicating and understanding key concepts of relevance to practising clinicians and researchers, and are particularly helpful in delineating and understanding confounders and potential sources of bias in exposure–outcome relationships. A bias is a systematic, incorrect interpretation of the true relationship between the exposure and the outcome. Biases differ from random error in that they distort our interpretation of true causal relationships in a non-random way: repeating a study, or increasing the sample size, will not lead to the elimination of bias. Confounders and biases may distort our interpretations in a variety of ways. If the researcher is not aware of confounders, and does not appropriately control for them, a variable may erroneously appear to cause the outcome where there is no causal relationship, or the magnitude of this relationship may be distorted. Conversely, if a researcher treats variables associated with both exposure and outcome as confounders when they in fact are not (see below), and inappropriately controls for them, this too may cause bias.

We first discuss how to create and interpret DAGs, using paediatric examples to demonstrate how they can identify, and appropriately correct for, confounders and biases in observational studies that can affect our ability to draw correct conclusions about causal relationships. We then outline how they can be helpful in interpreting interventional studies, and understanding potential threats to validity in these. After outlining some of the limitations of DAGs, we conclude with some thoughts on how they might prove useful for researchers and clinicians.

Creating and interpreting a DAG

Diagrams have been used to represent causal relationships for many years, in a variety of fields ranging from genetics to sociology.4,5,6,7 However, in recent years an epidemiological literature outlining a standard terminology and set of rules,8 has grown around DAGs. In a DAG, causal relationships are represented by arrows between the variables, pointing from cause to effect. As seen in Fig. 1a, an arrow from screen time to obesity means that we hypothesise that a change in screen time causes a change in adiposity. DAGs must obey two rules. First, they must be acyclic, which means that it is impossible to start at any variable in the DAG, follow the directed arrows forward, and end up at the same variable. In other words, a DAG must not contain a feedback loop where a variable causes itself. A corollary of this is that a causal relationship between two variables must be unidirectional: they cannot cause each other. Bidirectional arrows, often used to depict feedback loops, are in fact a simple graphical expedient to show as a single variable what in reality is a sequence of variables.9 Second, for a DAG to be complete, the shared cause of any two variables in the DAG must be included.10

Fig. 1
figure 1

a Screen time (the exposure) causes obesity (the outcome). b Screen time acts on obesity through the mediator of physical activity. c Low parental education increases both screen time and obesity, and is therefore a confounder. d Self-harm is a collider in the path from screen time to obesity

In a DAG, two variables can be connected by what is called a “path” between them. Open “paths” represent statistical associations between two variables; closed “paths” represent the absence of such associations (the correspondence between path “openness” and associations in DAGs derives from mathematics).8 Variables and arrows can be combined into three main types of paths as follows:

  1. 1.

    Directed paths: all arrows point in the same direction, and the association between these variables reflects a causal relationship. Such a path is open. Increased screen time leads directly to reduced physical activity, which in turn leads to an increased risk of obesity (Fig. 1b).

  2. 2.

    Backdoor paths: this is where two variables share the same cause. Looking at Fig. 1c, increased screen time and childhood obesity are influenced by low parental education. In the terminology of DAGs, screen time and childhood adiposity are said to be connected by a backdoor path through low parental education. This path (one that connects exposure and outcome through a third variable, including an arrow entering rather than emanating from the exposure) is open, and depicts a statistical association between screen time and adiposity, through low parental education. However, the association transmitted by this backdoor path is non-causal, and represents the basic structure of confounding.

  3. 3.

    Closed (or blocked) paths: this is when two variables have the same effect, called a collider (Fig. 1d). For example, both screen time and obesity have been found to increase the risk of low self-esteem, self-harm and suicidal ideation in adolescents,11,12,13 that is self-harm is a collider between physical activity and obesity. In this situation, unlike directed and backdoor paths, this path is closed: there is no association between screen time and adiposity transmitted through self-harm.

Researchers can change the status of a path from open to closed, or vice versa, by acting on (conditioning on, or controlling for) a variable, which can occur through study design or statistical adjustments such as restriction, stratification, matching, standardisation, or multivariable regression. Controlling for parental education (a confounder) in our example will close this backdoor path, and lead to a less biased relationship in the directed path between screen time and adiposity. Conversely, conditioning on a variable in a directed path between two variables (a mediator), physical activity in our example, closes this path, and could lead to an incorrect estimate of the true overall association between the variables. Finally, conditioning on a variable in a closed path (a collider) opens this path and leads to transmission of a non-causal association. If we were to mistakenly identify self-harm as a confounder, and condition on it, this would distort the true relationship between the exposure and the outcome. Equally, if a study examined the relationship between screen time and obesity in a group of adolescents selected because they had a history of self-harm, this would again represent conditioning on a collider, in this case by restriction (see “selection bias” below).

Using DAGs to understand confounding: paracetamol and wheeze

In the language of DAGs, a confounder is defined as a common cause of the exposure and the outcome. This situation occurs in nature; it is not created by the researcher. Confounders, if not identified and appropriately adjusted for (conditioned on), can distort the true causal relationship between an exposure and an outcome. As an example, a number of observational studies14,15 found an association between receiving paracetamol in the first year of life and later risk of developing wheeze/asthma. DAGs have proven useful in examining this relationship.16

In fact, the increased risk of later wheezing may not be due to paracetamol, but to confounding. Viral respiratory tract infections—for which paracetamol is prescribed—are common in children, trigger wheezing and might increase the risk of later wheeze.17,18 That is, viral infections act as a confounder in the relationship between paracetamol usage and wheezing (Fig. 2a) (so-called confounding by indication).19,20 Therefore we might expect these two variables (paracetamol use and wheeze) to be statistically associated through a common cause, even if there is no direct causal association. To identify the true causal relationship between paracetamol use and later wheeze, one should condition on viral infections. Conditioning in a DAG is generally shown as a box around the variable, and as described previously changes an open path (in this case a backdoor path) to a closed path (Fig. 2).

Fig. 2
figure 2

a Viral infections cause both paracetamol use and wheeze, acting as a confounder. b This bias can be controlled by conditioning on the confounder (shown by a box around viral infections)

Once we have closed this backdoor path by making the appropriate statistical adjustments, and assuming there are no other confounders, we should be able to identify the true magnitude, if any, of the relationship between paracetamol use and wheeze. Supporting this hypothesis, studies which have conditioned on respiratory tract infections in early life find a diminished relationship between paracetamol use and later wheeze, suggesting that part of this apparent relationship may be due to confounding.16,21,22

Risks of inappropriate adjustment: overadjustment

Whilst failing to identify confounders can threaten the validity of findings, the converse, inappropriately identifying other variables as confounders, can also be problematic.23 Take the relationship between the administration of antenatal steroids (the exposure) and the outcome of bronchopulmonary dysplasia (BPD) (Fig. 3a). A number of studies found that administration of antenatal steroids was not associated with a decreased risk of BPD,24,25,26 despite reducing several known risk factors for BPD. These studies adjusted for variables such as severity of neonatal disease,24 and need for mechanical ventilation,24,25,26 the rationale being that these factors are associated with both the administration of antenatal steroids and an increased risk of BPD.

Fig. 3
figure 3

a Antenatal steroids affect the risk of bronchopulmonary dysplasia (BPD). b Antenatal steroids also have indirect effects. c By controlling for disease severity and mechanical ventilation, we underestimate the true overall effect of the antenatal steroids

However, these variables do not fulfil the definition of a confounder (they are not causes of both exposure and outcome), but act as mediators between the exposure (antenatal steroids) and the outcome (BPD) (Fig. 3b). Conditioning on a mediator closes one of the causal paths between antenatal steroids and BPD and distorts the overall relationship between the two. This is represented in Fig. 3c by the boxes surrounding the two intermediate variables. Adjusting for (conditioning on) an intermediate results in overadjustment. With this process we remove the part of the association between antenatal steroids and BPD mediated through the reduction of severe illness, or the reduced need for mechanical ventilation.27 This adjustment can attenuate the true causal effect of the exposure or even reverse it, leading to counterintuitive results.

Supporting the interpretation that overadjustment might explain the apparent lack of effect of antenatal steroids on the development of BPD, a cohort study28 found a negative (protective) association between antenatal steroid administration and mediators (severity of neonatal disease and the need for mechanical ventilation), and a positive association between the mediators and the risk of the BPD. The authors found a protective effect of steroids on BPD when intermediate factors were not adjusted for, but not when they adjusted for these intermediate variables (Fig. 3c). That is, inappropriately conditioning on mediators led to a distortion of the true (likely protective) relationship between antenatal steroids and risk of developing BPD. Of note, whilst conditioning on mediators distorts the overall relationship between an exposure and an outcome, Fig. 3c also shows that this should reveal the direct effect of steroids on BPD (with the highly simplified assumption that there are no other common causes of steroid administration, BPD, or the mediators); this concept underlies the field of mediation analysis.29,30

Selection bias in the language of DAGs

In observational or interventional studies, selection bias occurs when both the exposure and the outcome affect whether an individual is included in the analyses. In the language of DAGs, selection bias occurs due to inappropriate conditioning on a collider. An example are studies that examined HLA subtypes (the exposure) as risk factors for the development of acute lymphoblastic leukaemia (ALL, the outcome). Initial cross-sectional studies using prevalent (i.e. already diagnosed) cases and matched controls31,32,33 found an increased risk of developing ALL in individuals with the HLA-A2 serotype (Fig. 4a).

Fig. 4
figure 4

a A possible relationship: HLA subtypes affect the risk of acute lymphoblastic leukaemia (ALL). b The true causal structure, showing selection bias: both HLA subtype and ALL influence survival, and the study is conducted in survivors

However, a subsequent study34 examined patients presenting with a new diagnosis of ALL (incident cases), and found that the frequency of HLA-A2 in these individuals matched that of the general population. When examining survivors at the time of typing, they found that the frequency of HLA-A2 was higher than in the general population, and that length of survival appeared to be associated with the HLA-A2 serotype. That is, HLA-A2 was not associated with an increased risk of developing ALL, but rather with an increased chance of survival. Because previous studies had examined a prevalent population (patients with a previous diagnosis of ALL) rather than an incident one (those presenting with a new diagnosis), they had examined the relationship between HLA-A2 and leukaemia in a sample restricted to survivors, and an incorrect association was inferred between the exposure and the outcome.

Presented as a DAG, the source of this bias, also called “incidence-prevalence” or Neyman’s bias,35,36 can be seen to be due to conditioning on a collider. In this case, both the exposure and the outcome influence a third variable, survival, which acted as a collider (Fig. 4b). If we consider all patients—surviving or not—by including newly diagnosed patients, the two variables are not associated (the path is closed). Conditioning on survival (by restricting the inclusion to patients surviving a certain time, as shown by the box surrounding survival in the figure), opens the path through the collider, and we enrich the sample for individuals with HLA-A2 among ALL patients, creating a (spurious) association.

Overadjustment and selection bias

Overadjustment and selection bias can also coexist. Take the relationship between maternal pre-eclampsia (the exposure) and subsequent cerebral palsy (the outcome): pre-eclampsia is hypothesised to be directly causative of cerebral palsy. However, pre-eclampsia is also associated with a higher risk of medically indicated preterm birth, which in turn is associated with a higher risk of cerebral palsy (Fig. 5a). We might think to examine the effect of pre-eclampsia after adjusting for preterm birth or gestational age (as if this represented confounding) (Fig. 5b), shown by the box around this variable. However, here preterm birth is an intermediate between pre-eclampsia and cerebral palsy, and not a common cause of both.

Fig. 5
figure 5

a Pre-eclampsia increases the risk of cerebral palsy directly, and indirectly by increasing preterm birth. b By adjusting for preterm birth, we underestimate the overall effect of pre-eclampsia on cerebral palsy. c Adjusting for preterm birth causes the estimated effect of pre-eclampsia on cerebral palsy to suffer from both overadjustment and selection bias

By (over)adjusting, we take away part of the detrimental effect of pre-eclampsia, that mediated through preterm birth. This adjustment can attenuate the true effect of the exposure and even reverse it. An early study found that maternal pre-eclampsia was protective in very preterm infants, but detrimental to those born at a later gestation.37 This was a surprising result—as a pathologic condition, we would expect pre-eclampsia to be detrimental across the entire spectrum of gestations.38,39 Visualised as a DAG, this finding could be due to the conditioning on gestational age at birth. In this case, conditioning does not take place through statistical adjustment, but by stratification (performing separate analyses in two groups) based on the criterion of gestational age at birth (preterm birth). This closes the causal path from pre-eclampsia to cerebral palsy via preterm birth, and could lead to bias.

However, the true situation is probably more complex. Examining a more realistic DAG, to which chorioamnionitis (as another cause of both preterm birth and cerebral palsy) has been added (Fig. 5c), one can see that gestational age, as the shared effect of both pre-eclampsia and chorioamnionitis, also acts as a collider. Critically, closing one path between two variables may lead to a change in other potential paths between the two. Conditioning on gestational age opens a previously closed path, from pre-eclampsia to cerebral palsy through preterm birth and chorioamnionitis. This creates a new source of bias, and another reason for the counterintuitive association found in our example. If we analyse the relationship between pre-eclampsia and the outcome within the group of preterm infants, a faulty comparison group and a spurious association will be created. If a preterm baby is born to a mother who has pre-eclampsia, the baby will be less likely to have chorioamnionitis and vice versa. Among preterm infants the effect of pre-eclampsia on cerebral palsy will be compared with the effect of another significant cause of cerebral palsy, chorioamnionitis, and pre-eclampsia will falsely appear to be protective. Thus, the estimated direct causal effect of pre-eclampsia on the outcome will be biased (through the effect of chorioamnionitis). Although widely used, conditioning on gestational at birth in studies of prenatal exposures and their relationship to postnatal outcomes may not reduce but actually lead to bias through overadjustment and faulty comparisons as illustrated above,40,41,42,43 and generate counterintuitive results and apparent changes of effect in different groups of patients.

Disentangling confounders from mediators and colliders can prove challenging. Statistical tests reveal only the strength of an association between two variables, not the causal relationship between them, and in this context the researcher must rely on causal reasoning.44 Here, DAGs, supported by subject-matter knowledge, can be helpful as they illustrate a modern definition of confounding:45,46 a common cause of both the exposure and the outcome under study. This demonstrates how older definitions,47,48 focusing on factors associated with the exposure and also related to the risk of disease in the unexposed, and not being an intermediate (i.e. on associations rather than presumed causal relationships), may lead to biased statistical estimates due to inappropriate adjustment for a common effect of two variables (conditioning on a collider).

The language of DAGs as applied to interventional studies

The DAG in Fig. 6a shows the causal structure of a randomised controlled trial (RCT) randomising women to an intervention promoting breastfeeding, the Baby Friendly Hospital Initiative (BFHI),49 to examine cognitive development in childhood.50 Random assignment determines the exposure (BFHI) which in turn influences the outcome (cognitive development) via mediators such as breastfeeding (and probably others, not shown). Even if there are confounders that influence both the chance of breastfeeding and the outcome, they do not bias the causal effect of the random assignment on the outcome, as breastfeeding is a collider in the path between random assignment and cognitive development via potential confounders, and blocks this path. Therefore, as regards confounding, an intention-to-treat analysis (according to how a mother was randomised) is likely to be unbiased, and DAGs demonstrate the critical value of randomisation in inferring unbiased causal relationships. A per-protocol analysis of whether a mother actually breastfed is not immune to confounding, as it resembles an observational study where a backdoor path exists between breastfeeding and the outcome via any confounders.51 Of course, the effect of treatment actually received may be of interest, and a per-protocol analysis, carefully controlled for confounders, may be justified to extract the maximum of information from clinical trials.52

Fig. 6
figure 6

a The structure of a randomised controlled trial (RCT); BFHI refers to the Baby Friendly Hospital Initiative. b Loss to follow-up in an RCT creates selection bias

Whilst RCTs and intention-to-treat analyses minimise threats to validity posed by confounding, they are not immune to other biases, including information bias (see glossary) and bias due to differential loss to follow-up. It is plausible that the BFHI might lead to differences in health awareness in the intervention group, leading to a different likelihood of follow-up clinic attendance. In addition, mothers with a child exhibiting signs of impaired cognitive development could also be more likely to attend follow-up to find out whether their child was developing normally. Seen as a DAG (Fig. 6b), both the BFHI and the outcome have a causal effect on the chance of follow-up. Examining data only on children who attend follow-up (conditioning on follow-up, represented by the box around clinic attendance), introduces bias into the relationship between the intervention and cognitive development via a faulty comparison, opening an otherwise closed path. Conditioning on a collider leads to what is called collider stratification bias.53,54 This example illustrates that whilst RCTs minimise confounding, they are still susceptible to bias such as that introduced by loss to follow up.

Limitations of DAGs and some caveats on their use

One limitation of DAGs is their non-parametric nature: they neither specify the form of the causal relationships, nor depict the size of the associations, and remain qualitative in nature. A DAG shows that uncontrolled confounding might bias the results, but does not give a quantitative measure of this.10,55 Another is that a DAG can only be as good as the background information used to create it;56 a DAG is complete and therefore has a causal interpretation only if it contains all common causes of any two variables (all confounders), including both measured and unmeasured variables. A further limitation is the inability of DAGs to depict random, as opposed to systematic, error. For instance, randomisation allocates known and unknown confounders equally between groups in the “average” ideal case. This does not always happen in real-world RCTs, where confounding, due to random differences at baseline, can—and indeed often does—occur, but is not shown by DAGs. Ignoring random error also means that when examining misclassification (information) bias, concepts such as non-differential measurement error (where error is randomly distributed across the groups being studied) cannot be incorporated into a DAG.

More subtly, and of relevance not only to DAGs but to any analytical approach, the research question influences how we consider variables and therefore analyse the data. A variable may be simultaneously a mediator, a collider or a confounder, can be interpreted differently in separate research questions using the same data, and these will dictate different analytical strategies. For example, in the study looking at the relationship between antenatal steroids and BPD, one could ask about the effect of steroids (exposure) on the outcome. Here the need for mechanical ventilation is a mediator and should not be conditioned on. However, if one were to investigate the effect of mechanical ventilation (treating it as an exposure) on the risk of BPD, antenatal steroids are a confounder in the backdoor path between mechanical ventilation and the outcome, and should be conditioned on. In addition, it is possible that two researchers might ask the same research question, using the same variables in their analyses, but choose to condition on different variables because they have different opinions regarding the underlying causal relationship. Representing their analyses as DAGs allows an explicit comparison between the two approaches should their findings differ.

Finally, throughout this article we have, of necessity, presented simple examples to illustrate our key points. Partly, this is inherent to the approach: any graphical method is likely to over-simplify the complex biological reality being investigated. DAGs have for this reason attracted criticism because they may lead to oversimplification in the field of causal inference.57,58 DAGs however do not lead per se to oversimplified analyses, but only explicitly present their underlying assumptions. There are also a number of theoretical points, such as the exact distinction between selection bias and confounding, that remain contested.59,60 We therefore direct interested readers to more in-depth reviews about the theory and limitations of DAGs.8,10,61,62

Conclusion

The aim of much clinical research is to elucidate and test causal relationships. In epidemiological terms, we want to establish exposures that might be amenable to modification, and test interventions acting on these leading to an improvement in health outcomes. Critical to a correct interpretation of causal relationships is correctly identifying and appropriately adjusting for confounders and potential sources of bias. In this review we have shown that DAGs can illustrate threats to validity found to greater or lesser extents in virtually all clinical research: confounding, selection (or collider-stratification) bias and overadjustment. We believe that DAGs are useful for practising clinicians in interpreting research that deals with proposed causal relationships, by allowing them to frame research questions and findings using the concepts of exposures, outcomes, intermediates, confounders and colliders. They remind those planning observational studies to collect sufficient data to condition on possible confounders, and to appropriately adjust for these in analyses, whilst refraining from inappropriate adjustments. Finally, they show that whilst randomisation does minimise the risks of confounding in interventional studies, possibilities for bias remain, for example through loss to follow-up.