Introduction

Aerosols are airborne particles that modify the planetary radiative budget, either directly, by absorbing or scattering radiation, or indirectly, by acting as cloud condensation nuclei (CCN) for the formation of cloud droplets and subsequently modifying the albedo of clouds1. Between pre-industrial times and 2019, anthropogenic aerosol emissions caused a negative effective radiative forcing (ERF) of −1.1[−1.7; −0.4] W m−22, thereby offsetting part of the global warming induced by greenhouse gases. The exact magnitude of the aerosol-induced cooling is hard to quantify, with large uncertainties originating from the understanding of aerosol-cloud interactions (ACI). Liquid stratocumulus clouds particularly contribute to this uncertainty because of their moderately high albedo (≈40%) and extensive coverage in oceanic regions with high insolation3,4. In liquid clouds, \({{{{\rm{ERF}}}}}_{{{{\rm{ACI}}}}}\) is a combination of the instantaneous Twomey effect and rapid adjustments of cloud macrophysical properties, namely liquid water path (LWP) and cloud fraction (CF)1. The Twomey effect describes how, at an initially constant LWP, an increase in the cloud droplet number concentration (Nd, used as a proxy for aerosol concentrations) will cause a decrease in the effective cloud droplet radius (reff5) but an increase in the total cloud droplet surface area and therefore an increase in cloud albedo6. This initial change in Nd can later trigger LWP adjustments (\(\frac{\partial {{{\rm{ln}}\,{\rm{LWP}}}}}{\partial \,{{{{\ln }}}}\,{N}_{{{{\rm{d}}}}}}\)), which are hard to constrain because they result from the superposition of two counteracting physical processes. On the one hand, a decrease in reff leads to precipitation suppression and subsequent increases in LWP and cloud albedo7. On the other hand, increased droplet concentrations and smaller radii will lead to suppressed droplet sedimentation and enhanced radiative cooling at the top of the cloud, thereby driving turbulence and entrainment of warm (and dry) air from the free troposphere into the cloud. This leads to evaporation of the smaller droplets, enhanced evaporative cooling and even stronger entrainment, resulting in decreases in LWP and cloud albedo8,9,10,11,12. Precipitation suppression and cloud-top entrainment enhancement are difficult to disentangle as these two processes occur simultaneously and they both involve a feedback loop between cloud microphysical properties and dynamical processes. This is illustrated in Fig. 1a.

Fig. 1: Application of the causal inference method to LWP adjustments.
figure 1

a Is a schematic of the physical interactions between aerosols, clouds, and their environment. Aerosols can shift the cloud droplet size distribution to smaller radii, with consequences on cloud-top entrainment we (enhancement) and precipitation RR (suppression). Both we and RR influence the size distribution (feedbacks), thus they influence LWP, which is integrated from the cloud droplet mass distribution over the cloud depth. b Is an expert-knowledge causal graph (graph A), which encodes the physical knowledge from a. Straight arrows indicate contemporaneous (i.e. lag 0) effects, while curved arrows indicate lagged (lag 1 = 15 min) effects. Autodependencies are not shown here for simplicity (Supplementary Fig. 1). c Is a Nd-LWP joint histogram plot made with the standardized timeseries of the Namibian stratocumulus deck used in this study (see methods). It shows the probability density function (PDF) of the data points26, with yellow dots indicating the median LWP in each Nd bin. It can be noted that, contrary to other studies26,30, there is no positive slope at low Nd, which is a consequence of spatial averaging (Supplementary Fig. 2). b Can be used for causal inference to go beyond the negative correlation observed in c and to quantify causal effects of Nd on LWP.

The sign and magnitude of LWP adjustments depend on temporal and spatial scales, cloud regimes and environmental conditions, making it hard to interpret mere correlations as causal effects of aerosols on clouds. The main difficulty consists in removing confounding, i.e. the effect that a variable Z (e.g. an environmental factor) has on both a cause-variable X (e.g. aerosol properties) and an effect-variable Y (e.g. cloud properties), thereby inducing a spurious correlation between X and Y. Ideally, causality is inferred from randomized controlled trial experiments. Unfortunately, this is rarely feasible in the field of Earth sciences. A good alternative are the so-called opportunistic experiments (e.g. ship tracks or volcano eruptions)13,14,15, or the use of climate models to simulate the response of the climate system to a given forcing while keeping all other forcings constant (i.e. evaluating counterfactual climate scenarios)16. However, opportunistic experiments and model simulations have drawbacks, especially when it comes to representativity, or to computational cost for simulations. For this reason, many studies use non-opportunistic observational data, and in particular, satellite observations that have representative coverage in both time and space.

Previous ACI studies have tried to explicitly identify and remove sources of confounding in observational data, such as the confounding effect of relative humidity (RH) on CF adjustments17, or the confounding effect of rainfall on convective cloud invigoration18. Methods from causal inference19,20,21,22 have also been applied to satellite studies of LWP adjustments23,24,25,26,27,28,29. These studies use multivariable regressions of the effect-variable on the cause-variable as well as the environmental covariates, or data binning as a function of a given covariate in order to remove spurious confounding effects. Most of these satellite studies find a negative Nd-LWP sensitivity, i.e. a decrease in LWP following an increase in Nd, causing a warming effect that can compensate part of the cooling due to the Twomey effect. The suggested reason is the prevalence of cloud-top entrainment enhancement over precipitation suppression. However, the strength of these two physical processes is modulated by environmental conditions. The above studies (ibid.) have shown that LWP adjustments can become less negative (or even reverse to positive) when conditioning the analyses on given environmental variables: large lower tropospheric stability (LTS) and high free tropospheric RH (RHFT) conditions, precipitating cloud regimes (characterized by low droplet number concentrations, large droplet sizes and deep clouds), i.e. conditions where entrainment enhancement becomes weaker and/or precipitation suppression becomes stronger. This illustrates how accounting for environmental conditions can change the magnitude and sign of a correlation and yield causal effects that can be different from the original correlation. However, environmental variables are often treated equally as variables that can impact the causal effect of Nd on LWP, without necessarily specifying whether they act as confounders, mediators, colliders, etc. (Supplementary Fig. 3). A causal graph (Fig. 1b) can help to explicitly describe the relationships between cloud and environmental variables. This allows us to know which environmental factors should be conditioned on, and which factors should not. In fact, Simpson’s paradox describes how, depending on the causal structure underlying the data, one can sometimes draw false conclusions when conditioning on the wrong variable20 (Supplementary Fig. 3).

Recent studies have also investigated the temporal development of LWP adjustments in order to use the precedence of cause with respect to effect30,31,32,33. Geostationary satellite data are a promising resource to resolve causality for ACI as their temporal resolution (Δt ≈ 10–15 min) matches the process timescale of macrophysical cloud adjustments to aerosol perturbations. For a stratocumulus with a typical geometrical height H of 300 m and a typical updraft speed of 0.5 m s−134, the expected circulation time of an air parcel through the cloud height is \(\frac{300\,{{{\rm{m}}}}}{0.5\,{{{\rm{m}}}}\,{{{{\rm{s}}}}}^{-1}}=600\,{{{\rm{s}}}}\approx 10\,\min\). At this resolution, it becomes possible to resolve feedback loops35 (Supplementary Fig. 4) involved in LWP adjustments. It should be noted that the choice of spatial scale can also introduce confounding. In particular, spatial aggregation could lead to spurious correlations resulting from Simpson’s paradox if performed over an area encompassing different cloud types (Supplementary Fig. 5). The impact of spatial aggregation on ACI has already been addressed in other studies36,37,38. However, Bender et al.39 observed that stratocumulus albedo variability is more related to temporal rather than spatial variability (using monthly satellite data on a 1 × 1 grid). For this reason, and because less attention has been paid to temporal developments, we chose to focus on temporal developments of domain-averaged cloud properties in this study.

In this study, we apply a transparent causal methodology to investigate LWP adjustments in stratocumulus clouds. We propose the causal graph in Fig. 1b, which encodes physical knowledge about cloud processes (Fig. 1a) and which we apply to geostationary satellite data of the Namibian stratocumulus deck. We then showcase a method to derive causal effects, i.e. causality-grounded sensitivities that go beyond simple correlations (Fig. 1c) and can shed some light on the conflicting estimates found in the literature by focusing on physical processes rather than state variables. Instead of focusing on precipitation- and entrainment-dominated regimes (low vs. high Nd) separately, we disentangle LWP adjustments that are simultaneously mediated by rain rates (RR) and entrainment rates (approximated by the entrainment velocity we). We do not explicitly include further environmental covariates (e.g. LTS, RHFT) as variables of the causal graph to keep it (relatively) interpretable, but instead we investigate the LTS/RHFT-specific effects, i.e. how LTS/RHFT modulate the influences of RR and we on the causal effect of Nd on LWP, denoted \({\beta }_{{N}_{{{{\rm{d}}}}},\,{{{\rm{LWP}}}}}\,=\,\frac{\partial {{{\rm{ln}}\,{\rm{LWP}}}}}{\partial \,{{{{\ln }}}}\,{N}_{{{{\rm{d}}}}}}\).

Results

Physical description of the causal graph

Figure 2 shows linear direct causal effects \({\alpha }_{{X}_{i},{X}_{j},{l}_{ij}}\), computed using Wright’s approach40,41 applied to timeseries of the Namibian stratocumulus deck (c.f. methods). \({\alpha }_{{X}_{i},{X}_{j},{l}_{ij}}\) represents the direct effect of a variable Xi on another variable Xj on the lij-lagged arrow linking Xi and Xj in the graph. The \({\alpha }_{{X}_{i},{X}_{j},{l}_{ij}}\) are similar to linear regression slopes between X and Y except that the graph is used to detect and remove any source of confounding prior to the regression. The results confirm the physical plausibility of the proposed causal graph, as the (statistically significant) signs of the direct causal effects agree well with the physical processes expected to underlie each arrow (marked as “correctly detected” in Table 1). A complete result table is provided in the Supplementary material (Supplementary Table 1). It should be noted that, because the data are adjusted for seasonal and diurnal cycles and standardized (see methods), the absolute magnitude of causal effects derived here cannot be directly compared to other studies that use non-standardized data. However, one can still comment on the physical relevance of the sign and relative magnitude of causal effects.

Fig. 2: Magnitude and sign of the direct causal effects.
figure 2

The direct causal effects \({\alpha }_{{X}_{i},{X}_{j},{l}_{ij}}\) were calculated for graph A (Fig. 1b) for the Namibian stratocumulus deck, over the 2-year time period of the study. Roman capital letters (A–H) indicate arrows that describe specific physical effects, and the agreement of the computed direct effects with the physical effects is indicated in Table 1. Straight arrows indicate contemporaneous (i.e. lag 0) effects, while curved arrows indicate lagged (lag 1) effects. All nodes shown in gray were hypothesized to be autodependent (lag-1 arrow from X(t − 1) to X(t)).

Table 1 Association of the direct causal effects \({\alpha }_{{X}_{i},{X}_{j},{l}_{ij}}\) with the corresponding cloud physical processes.

Lag-0 positive arrows from Nd, reff and H to LWP (arrow A in Fig. 2) simply correspond to the definition of LWP as a vertical integral of the liquid water content5. The negative arrow from Nd to reff (arrow B) describes how an increase in Nd causes a decrease in reff at a constant LWP6. The arrow from H to reff (arrow C) is positive, in line with the continuous condensational growth of cloud droplets as they are carried upwards in an adiabatic cloud42. Although, in reality, arrows (B) and (C) might be lagged, these effects are considered to be contemporaneous here, as the three variables are derived simultaneously from the same cloud-top satellite measurements.

The cloud-top entrainment enhancement feedback is well described by the direct causal effects, with a negative and significant effect of reff on we (arrow D). This describes the fact that larger cloud droplets tend to sediment, thereby moving cloud water away from the inversion level and preventing turbulence induced by cloud-top radiative and evaporative cooling to enhance entrainment8,9,10,11. In turn, we has a negative effect on reff and Nd (arrow E), indicating the evaporation of entire cloud droplets due to mixing with warm (and dry) free tropospheric air at cloud top. This suggests a mixture of homogeneous and extreme inhomogeneous entrainment regimes43, as was also observed by ref. 44 in direct numerical simulations of stratocumulus clouds. The effect of we on H is also negative and significant, meaning that entrainment of dry and warm free tropospheric air reduces cloud depth. It can be noted that, although the causal effects of we on reff, Nd and H (Supplementary Table 1) are significant, they are quite weak, potentially due to the large-scale approximation used for the computation of we (see methods). Even though we did not explicitly include environmental variables (RHFT, LTS) in the causal graph, causal effects can be evaluated for data binned by environmental factors. This reveals a regime-dependence of entrainment: entrainment mixing becomes more homogeneous (reff is reduced but Nd remains constant)43 under moist free tropospheric conditions or polluted conditions (lines 3 and 8 in Supplementary Table 1), which agrees with ref. 44 and ref. 45. Under such conditions, the evaporative timescale becomes longer than the mixing timescale, thus evaporating all cloud droplets homogeneously. On the contrary, under dry free tropospheric conditions, entrainment mixing becomes more inhomogeneous (Nd is reduced but reff remains constant) as evaporation becomes faster. Our results also seem to indicate that entrainment becomes more inhomogeneous in unstable boundary layers, although one would expect the opposite, as the mixing timescale should become shorter. This might be due to the fact that, in an unstable boundary layer, where lateral entrainment becomes dominant, the large-scale estimate of we used here is not a good proxy for mixing. A lag 1 was chosen for arrows (D) and (E) to indicate that entrainment does not happen instantaneously at the entrainment interfacial layer.

The precipitation suppression feedback is also detected by the direct causal effects. The positive lag-1 arrow from reff to RR (arrow F) indicates that larger droplets are likely to initiate precipitation at the next timestep. The negative lag-0 effect of RR on Nd (arrow G) indicates that rain onset at cloud base immediately removes droplets from the cloud. Although, in theory, there could also be a lag-0 arrow from RR to reff, we make the approximation that collection efficiency is roughly independent of size. In fact, for stratocumulus drizzle drops in the range 50–100 μm34, the collection efficiency of cloud droplets >10 μm only varies between 60 and 70% 46. The lag-1 arrow from RR to Nd (arrow H) describes processes that occur below the cloud and only impact Nd at the next timestep. Arrow H is found to be weakly positive and insignificant, although one would expect wet scavenging to make this arrow negative. It is possible that dynamical effects (e.g. updraft enhancement around cold pool edges47,48) somehow counterbalance Nd losses from wet scavenging, although this remains speculative and would need to be investigated further.

All variables (except LWP) were chosen to have causal autodependencies (arrow from X(t − 1) to X(t)) to illustrate the inertia of the physical variables that they represent. LWP was chosen to have null autodependency as LWP is fully determined by Nd, reff and H at each timestep. Note that a null causal autodependency does not prevent LWP from having non-zero statistical autocorrelation.

Temporal developments of causal effects

Direct causal effects \({\alpha }_{{X}_{i},{X}_{j},{l}_{ij}}\) confirm the validity of the proposed causal graph but do not provide a full picture of causal effects. Total causal effects βX,Y,l (i.e. between two variables X(t − l) and Y(t) that are not directly linked by a single arrow in the causal graph) can be derived from the direct causal effects using Wright’s path approach, and these total effects can inform us about temporal LWP changes following aerosol perturbations. Figure 3 shows the temporal evolution of the βX,Y,l (see methods) for a selection of (X, Y) pairs, where a positive (negative) βX,Y,l means that an increase in X causes a increase (decrease) in Y after a lag l.

Fig. 3: Temporal developments of causal effects.
figure 3

The subplots show the temporal evolution of the causal effect of Nd on RR (a), we (b), LWP (c), LWP - mediated by RR and we (d), after an initial perturbation in Nd at l = 0. Plots a, b also show the regime-dependence of the aerosol-induced precipitation and entrainment responses. The negative response in a is coherent with precipitation suppression, and the positive response in b is coherent with entrainment enhancement. Plots c, d shows the resulting effect on the Nd-LWP sensitivity as calculated with the path approach. d Shows the portion of the total effect of Nd on LWP (c) that is mediated by RR and we. The shading corresponds to the 90% bootstrap confidence interval. The shading is not shown for sub-regimes in a, b to avoid cluttering the plots.

The temporal evolution of precipitation suppression (Fig. 3a) shows a peaking negative \({\beta }_{{N}_{{{{\rm{d}}}}},{{{\rm{RR}}}},l}\) observed 4–6 h after the initial Nd perturbation. The Nd-RR sensitivity then decays back to 0 within 24 h, describing the return of the cloud system to an equilibrium state as other microphysical and dynamical processes take over. Cloud top entrainment enhancement is also well detected (Fig. 3b), with the strongest positive \({\beta }_{{N}_{{{{\rm{d}}}}},{w}_{{{{\rm{e}}}}},l}\) observed about 12 h after the initial Nd perturbation and continuing entrainment enhancement well beyond 24 h. The timescale of precipitation suppression is faster than that of cloud top entrainment enhancement, which agrees with theoretical calculations49 and observations of ship tracks31.

Also shown on Fig. 3a–b is the regime dependence of \({\beta }_{{N}_{{{{\rm{d}}}}},{{{\rm{RR}}}},l}\) and \({\beta }_{{N}_{{{{\rm{d}}}}},{w}_{{{{\rm{e}}}}},l}\). The boundary layer stability does not seem to impact \({\beta }_{{N}_{{{{\rm{d}}}}},{{{\rm{RR}}}},l}\) so much. Dryer free tropospheric conditions seem to be associated with a more negative \({\beta }_{{N}_{{{{\rm{d}}}}},{{{\rm{RR}}}},l}\), even though there is no obvious physical explanation. Entrainment enhancement is stronger in unstable and drier free tropospheric conditions. Under such conditions, entrainment is favored, and the entrained air causes more evaporation, thereby enhancing evaporative cooling and downdrafts, and in turn further entrainment. In cases of a moist free troposphere, \({\beta }_{{N}_{{{{\rm{d}}}}},{w}_{{{{\rm{e}}}}},l}\) is slightly negative (entrainment suppression), which could be a result of additional moisture being transported into the cloud via entrainment, leading to cloud growth and sedimentation of the cloud top further away from the inversion level. We also evaluated the clean vs. polluted conditions (using a threshold Nd), but we do not include them in Fig. 3a, b because conditioning on Nd results in a sub-case of Simpson’s paradox (the biased lines are provided in Supplementary Fig. 6).

Figure 3c shows that the total causal effect of Nd on LWP is initially positive, but quickly becomes negative and remains negative for up to 24 h. This temporal development offers a more complete picture than the negative Nd-LWP correlation from Fig. 1c. Figure 3d shows the fractions of the total causal effect of Nd on LWP that are mediated by RR and we. Although the RR-mediated effect on the LWP is initially slightly positive, it becomes slightly negative after a few hours, which is inconsistent with LWP build-up from precipitation suppression. The slightly negative effect could be due to the lag-0 RR → Nd arrow being too weakly negative compared to the other arrows in Fig. 2. There could be data-related or causal model-related reasons why the RR variable does not fully behave as expected. For instance, it is possible that RR retrievals are too noisy due to the difficulty in measuring precipitation from lightly-drizzling stratocumulus clouds using satellites50,51, leading to incomplete confounding removal (Supplementary Fig. 7). In particular, RR is an estimate of surface precipitation, while, ideally, RR would measure the cloud-base precipitation to better evaluate its impact on the cloud water budget. Additionally, there could be biases associated with the use of the adiabatic assumption, which is not valid for strongly precipitating clouds52. Finally, there could be unknown sources of confounding that causal graph A does not capture. The we-mediated effect on LWP is negative, which is consistent with water loss by evaporation of cloud droplets due to enhanced entrainment of warm (and dry) free tropospheric air at cloud top. The we-mediated effect is long-lasting, with a peaking negative sensitivity after ~20 h. This agrees with the timescales of negative LWP adjustments that were calculated in30.

A comparison of the y-axis scales of Fig. 3c and d shows that the long-lasting negative Nd-LWP sensitivity is mainly driven by entrainment enhancement, confirming the conclusions of multiple studies8,9,12. However, although previous studies have demonstrated the importance of precipitation controls on the mesoscale structure and water budget of stratocumulus clouds7,53,54,55, this study only detects weakly positive (then negative) influences of precipitation suppression on LWP. As explained above, this could be caused by data-related or causal model-related issues.

The positive lag-0 \({\beta }_{{N}_{{{{\rm{d}}}}},{{{\rm{LWP}}}},l = 0}={\alpha }_{{N}_{{{{\rm{d}}}}},{{{\rm{LWP}}}},0}+{\alpha }_{{N}_{{{{\rm{d}}}}},{r}_{{{{\rm{eff}}}}},0}\times {\alpha }_{{r}_{{{{\rm{eff}}}}},{{{\rm{LWP}}}},0}\) (path rule) could be explained by increased condensation rates of cloud droplets due to an increased cloud droplet surface area (more numerous and smaller)56. Alternatively, it is possible that the use of a noisy RR variable led to an incomplete removal of confounding from RR, leading to a positively confounded contemporaneous effect of Nd on LWP \({\beta }_{{N}_{{{{\rm{d}}}}},{{{\rm{LWP}}}},0}\). It is also possible that the temporal resolution of these data, although high, is still not high enough, and that precipitation-mediated aerosol influences are already noticeable at lags l < 15 min, although the graph says they should not, because of the lag-1 arrow (F) from reff to RR. In ref. 35, Runge describes how temporal resolutions coarser than process timescales can lead to confounded causal estimates.

With this potentially incomplete removal of confounding from RR, it is essential to take a critical look at the temporal development of the causal effect of Nd on LWP from Fig. 3c, as the strong positive sensitivity of LWP to Nd at lag 0 might be an artifact due to an incorrect diagnosis of RR-mediated influences on LWP. Instead, we might imagine a \({\beta }_{{N}_{{{{\rm{d}}}}},{{{\rm{LWP}}}},l}\) that is closer to 0 at lag 0, quickly becomes positive due to fast precipitation suppression, then slowly reverses to negative due to long-lasting entrainment enhancement.

Causal graph sensitivity study

To test the sensitivity of the results to the initial graph assumption, three other plausible causal graphs (graphs B, C and D) are evaluated (Fig. 4). Graph B does not include cloud depth H, graph C includes neither H nor reff, while graph D includes neither H, reff nor RR and we.

Fig. 4: Causal graph sensitivity study.
figure 4

Figures 2 and 3 obtained for graph A are reproduced here for three other graphs to test the robustness of the results and to compare the physical plausibility of the four graphs. Plots (ac) show direct causal effects for graphs B, C and D (similar to Fig. 2 for graph A). The time-unfolded versions of graphs B, C and D are provided in and Supplementary Fig. 1. Plots (d, e) reproduce the black line in Fig. 3b, plots (f, g) reproduce the black line in Fig. 3a, plots (h, i) reproduce Fig. 3d, and plots (jl) reproduce Fig. 3c.

The first row (Fig. 4a–c) shows that direct causal effects for graphs B, C and D are similar to the ones found for graph A (Fig. 2), with one notable difference: the direct effect of Nd on LWP is negative in graphs C and D, which is not physical, as one would expect the negative effect of Nd on LWP to be exclusively mediated by entrainment we. This simple comparison suggests that graphs A and B (which contain reff) are more physical than graphs C and D.

The second and third rows (Fig. 4d–g) show that precipitation suppression and entrainment enhancement are correctly detected in graphs B and C, as \({\beta }_{{N}_{{{{\rm{d}}}}},{{{\rm{RR}}}},l}\) and \({\beta }_{{N}_{{{{\rm{d}}}}},{w}_{{{{\rm{e}}}}},l}\) look very similar to the ones derived from graph A (Fig. 3a, b). This highlights the robustness of the results concerning precipitation suppression and entrainment enhancement, i.e. the microphysics to dynamics branch of the feedback loops.

However, the physical description of the dynamics to microphysics branch is not accurate with graphs B and C, as shown in the fourth row (Fig. 4h–i). The we-mediated effect on LWP is positive in graph C, which disagrees with the evaporation of liquid water due to entrainment enhancement. In graph B, the we-mediated effect is initially correctly detected as negative, similarly to graph A, but the effect is weak and eventually becomes positive. The RR-mediated effect on LWP is even more negative in graphs B and C than in graph A, which disagrees with the build-up of LWP due to precipitation suppression. This suggests that the effects of precipitation and entrainment on LWP following aerosol perturbations can be best captured when considering changes in both reff and H (i.e. with causal graph A). In particular, the pivotal role of reff for an accurate representation of how we influences the LWP is probably due to the mixed homogeneous/extreme inhomogeneous entrainment regime43, as discussed previously (Fig. 5a). As H is a confounder of the causal link from reff to LWP, it is essential to add H in the causal graph alongside reff. In fact, including H avoids confounding due to aggregation over different precipitation regimes (Fig. 5b).

Fig. 5: Role of reff and H as mediating and confounding variables of LWP adjustments.
figure 5

In a the two graphs are extracted from causal graphs C (left) and A (right, see Figs. 2 and 4). The comparison of the causal paths from we to LWP in both graphs shows how the influences of we change from positive in graph C (unphysical) to negative in graph A, when reff is taken into account (physical). b Shows the regression of LWP on Nd (similar to Fig. 1c). The dashed gray line shows the regression on all the data points (βaggregated = −0.6). The regression slopes after disaggregating the data by cloud depth bins (colored lines) are weaker than the aggregated slope (by a factor of >3). This is an example of Simpson’s paradox (Supplementary Fig. 3), as H is a confounder of the causal link from reff to LWP. The explanation is as follows: due to accretion of rain drops and cloud droplets, deeper clouds tend to have fewer but larger droplets and larger LWP, while thinner clouds tend to have more but smaller droplets with a lower LWP. This results in an overly negative LWP vs. Nd relationship when aggregating the data from deep and thin clouds.

The last row (Fig. 4j–l) shows the temporal development of the total Nd-LWP effect. This development differs greatly between the different graphs, which is a direct consequence of the differences observed in Fig. 4a–c and h–i. For graphs C and D, the Nd to LWP causal effect is always negative, with strong negative sensitivities reached at lag-0, which slowly decay back to zero over the course of 24 h. Consistently negative developments have also been observed in ship track studies31. Including reff in the graph (graphs A and B) permits to remove the confounding influence of we on the initial sensitivity and changes the initial sign of the causal effect of Nd on LWP from negative to positive. For graph B, the Nd-LWP temporal development is similar to graph A, although with a much stronger magnitude. Compared to graphs B, C and D, graph A yields weaker LWP adjustments (note the different y-axis scales in Fig. 3c and Fig. 4j–l). In particular, when ignoring all sources of confounding (graph D), LWP adjustments are predicted to be strongly negative, implying a strong compensation of the cooling Twomey effect by LWP adjustments. When covariations in reff and H are taken into account with graph A, LWP adjustments are weaker. By integrating \({\beta }_{{N}_{{{{\rm{d}}}}},{{{\rm{LWP}}}},l}\) over time, we find that LWP decreases predicted from graph A only represent about 6% of LWP decreases predicted from graph D (after 24 h, in response to a 1-standard deviation increase in Nd). This means that the cooling effect of aerosol-liquid cloud interactions (including Twomey and cloud adjustments) could be much stronger than previously thought.

Discussion

This study proposes a physics-informed causal graph to quantify the causal effect of Nd on LWP in marine stratocumulus clouds. We evaluated the causal graph on daytime geostationary satellite data colocated with reanalysis data, at a temporal resolution that is expected to match the process timescale at which macrophysical changes are propagated through stratocumulus clouds. Contrary to other studies that looked directly at the temporal evolution of the Nd-LWP sensitivity30,31,32, we divided this sensitivity into its physical components, by separately investigating the entrainment- and precipitation-mediated responses. This physical process-oriented approach (as opposed to a state variable-oriented approach) allows us to remove environmental confounding that targets these physical processes, and hence to calculate causal effects (as opposed to correlations), while checking the physical plausibility of the results. We were able to disentangle LWP adjustments due to precipitation suppression and entrainment enhancement on different timescales (fast vs. slow), leading to LWP adjustments that are both regime- and time-dependent. We confirmed cloud-top entrainment enhancement as a key control for LWP adjustments, and noticed issues associated with precipitation that deserve to be addressed in future research. The methodology adopted in this study showcases how to conduct a thorough causal effect estimation analysis: from discussing physical assumptions behind the causal graph to a systematic investigation of lagged causal effects, mediation and regime-dependence with a focus on the sensitivity of the results on the assumed graph.

Of course, all the results in this study are contingent upon a set of assumptions being met: (1) validity of the causal graph, (2) linearity of causal effects, (3) absence of hidden confounders, (4) stationarity of the causal effects, (5) trustworthiness of data.

We partially tested the implications of assumption (1) by comparing the results for four different plausible causal graphs.

Although the short time lags (0 or 15 min) might justify the use of a linear assumption (2) as a first order approximation, it might be worthwhile to investigate non-linearities with the adjustment approach (see methods) in future research.

Concerning assumption (3), if there were any unobserved confounders, other than the ones already included in the graph, the method presented here would have to be adapted to deal with hidden confounding57. For instance, this study ignored the confounding that can arise from the use of the adiabatic assumption to derive cloud properties. This retrieval assumption implies deterministic relationships that causally differ from the physical relationships between the variables (Supplementary Fig. 8), and a causal framing of this issue is still lacking in the literature. Moreover, it has been demonstrated that the use of this assumption can introduce correlated noises in satellite retrievals. This can introduce spurious correlations in the Nd-LWP relationship (Supplementary Fig. 9)35. For instance58, showed how an initially positive Nd-LWP can be falsely interpreted as negative because of such retrieval noises. Further confounders might need to be included in graph A. For example, including cloud-base updraft speeds, or another proxy for the influence of cold pools47 in the causal graph could help to solve some of the issues encountered with RR in this study.

Although the physical processes in the causal graph are expected to be stationary (4), the passive satellite instruments only provide daytime cloud property retrievals, and, for the sake of this analysis, we assumed that the magnitude of the causal effects remained unchanged through the night.

Finally, there might be additional data-related issues (5). The physical process-based causal approach used here allows us to diagnose where some data-related issues might arise. For instance, the analyses suggested potential issues with precipitation retrievals (a difficult task for lightly-precipitating stratocumulus clouds50,51), leading to incorrect detection of precipitation-mediated LWP adjustments. Moreover, the coarse spatial averaging of the data is a limitation of this study, and the impact of spatial aggregation on causal effects will be evaluated in future work. Other issues might include: the lack of vertical information in passive satellite retrievals; the failure of the adiabatic assumption in cases of strong entrainment or precipitation52; the sampling strategy chosen for the calculation of Nd59; or an imperfect colocation of the different satellite and reanalysis products over the stratocumulus region.

Future causal studies of ACI could focus on evaluating the effect of data-related issues (3–5) by applying the methodology presented here to other sources of data, e.g. model data or in-situ data. In that sense, a causal approach can be used as a physics-informed diagnosis tool to identify the sources of discrepancies in ACI estimates between different data sources, similarly to what is done for model evaluation in60. A complete study of how errors related to retrieval assumptions propagate from the data to the causal effects would also be of interest.

Despite these limitations, the causal inference method presented here provides a helpful framework to address confounding. In particular, the graph sensitivity analysis allows to identify which variables need to be included to obtain physically plausible causal effects. It revealed that the aerosol-induced LWP response is overly negative if environmental confounding is not properly removed with the use of timeseries and with the appropriate consideration of covariations in cloud droplet sizes and cloud depths. This implies that the cooling effect of ACI could be underestimated when failing to account for the effects of meteorological covariations on LWP adjustments in marine stratocumulus regions. This agrees with other studies that used causal approaches (e.g. opportunistic experiments, such as refs. 14,15,61). These results put into perspective the vastly different Nd-LWP sensitivities found in the literature and highlight the importance of considering confounding as well as long-term developments to accurately calculate cloud sensitivities.

Methods

Data

The data used in this work are 2-year timeseries (2016–2017) of satellite cloud retrievals co-located with reanalysis data over the Namibian stratocumulus deck (10–20 S, 0–10 E, as defined by ref. 3). The level 2 cloud properties (reff, LWP) were obtained from the Cloud Physical Properties (CPP) product of the CLoud property dAtAset using SEVIRI (CLAAS 2.1)62, where SEVIRI is the Spinning Enhanced Visible and InfraRed Imager aboard the eleventh Meteosat Second Generation geostationary satellite. Low liquid clouds were filtered by using the cloud type product. Nd and H were calculated using the adiabatic assumption63,64:

$$\begin{array}{c}{{{{\rm{N}}}}}_{{{{\rm{d}}}}}=\frac{1}{2\pi k}{\left(\frac{5{C}_{{{{\rm{w}}}}}\tau }{{Q}_{{{{\rm{ext}}}}}{\rho }_{{{{\rm{w}}}}}}\right)}^{\frac{1}{2}}\times {r}_{{{{\rm{eff}}}}}^{-\frac{5}{2}}\times {\tau }^{\frac{1}{2}}\\ H={\left(\frac{5\tau }{3\pi {Q}_{{{{\rm{ext}}}}}}\right)}^{\frac{3}{5}}\times {\left(\frac{3{C}_{{{{\rm{w}}}}}}{4\pi {\rho }_{{{{\rm{w}}}}}}\right)}^{-\frac{2}{5}}\times {(k{N}_{{{{\rm{d}}}}})}^{-\frac{1}{5}}\end{array}$$

where Nd is assumed to be constant along the cloud depth, τ is the cloud optical depth, k = 0.72 65 is a factor accounting for the width of the droplet size distribution, Qext ≈ 2 is the scattering coefficient, ρw is the density of water and Cw is the adiabatic condensation rate. Cw is a function of temperature and pressure66, which we calculate using the cloud top temperature and pressure from the CTX product of SEVIRI. For the calculation of Nd, pixels where reff < 4 μm or τ < 4 are filtered out because of high retrieval uncertainties associated with these lower values59. Although geostationary data are still under-used in ACI studies, it should be noted that the SEVIRI cloud products have been extensively validated with other more commonly used polar-orbiting satellite data products67. Because few studies have specifically validated cloud droplet number concentrations from SEVIRI (e.g. ref. 68), we compared Nd from SEVIRI and from the level 3 MODIS Terra satellite69 over our study region and period and found a very good agreement of the two derived products (Supplementary Fig. 10). The slight positive bias in Nd from SEVIRI should not be an issue since the data are standardized in this study.

Precipitation rates (RR) were obtained from the Global Precipitation Measurement (GPM) Integrated Multi-satellitE Retrievals Version 6 (iMERG V06)70. Meteorological variables were downloaded from ERA571,72. LTS was calculated as the difference in potential temperatures between 700 and 1000 hPa3. The entrainment velocity of free tropospheric air into the boundary layer (we) was calculated using a large-scale boundary layer continuity equation, following73:

$$\frac{{{{\rm{d}}}}\,{{{\rm{BLH}}}}}{{{{\rm{d}}}}\,t}={w}_{{{{\rm{e}}}}}+{w}_{{{{\rm{subs}}}}}$$
(1)

where wsubs is the rate of large-scale subsidence, taken at 700 hPa. This yields we values on the order of a few millimeters per second (Supplementary Table 2), which is comparable to other studies that employ the same equation8,74,75.

The different data products were first co-located on the 0.25° × 0.25° ERA5 grid and linearly interpolated to the temporal resolution of the SEVIRI data, i.e. Δt = 15 min. Then, because the emphasis of this study is on the temporal resolution, and in order to simplify the causal analysis, the data are spatially averaged over the larger 10° × 10° stratocumulus region. It should be noted that the averaging of cloud properties is performed on in-cloud properties, not on all-sky properties, i.e. we exclude clear-sky zero values before averaging.

The diurnal and seasonal cycles might be a source of confounding, i.e. they can introduce correlations in meteorological properties and cloud properties that may not be causal in nature (Supplementary Fig. 11 for an illustration). For this reason, the data were adjusted for the diurnal and seasonal cycle by computing a seasonal instantaneous mean value and standard deviation for all variables (i.e. we take the mean and standard deviations of all data points for a given timestamp, e.g. 10:15 am), and using these means and standard deviations to standardize all data points with the same timestamp. Because of the standardization, the units of causal effects are not on a natural physical scale, but rather in units of deviation of the variable from its seasonal instantaneous mean per unit of the seasonal instantaneous standard deviation. As a consequence, the interpretations focus on the sign of the causal effects and their relative strength, but not on their absolute magnitude. The cloud properties were log-transformed prior to the standardization, for consistency with previous studies in which sensitivities are expressed as the derivatives of the logarithms of the variables. Zero precipitation values were removed from the dataset prior to the application of the log-transform and precipitation values smaller than the first percentile were removed from the dataset. The whole precipitation timeseries were shifted backwards by 1 timestep in order to approximate cloud-base precipitation rates instead of the surface-level precipitation rates.

The original timeseries, as well as the adjusted timeseries, are shown in Supplementary Fig. 12 and average values for all variables are shown in Supplementary Table 2.

Causal effect estimation

Causal effect estimation20,21,22,57,76 requires three ingredients:

  1. 1.

    A causal graph describing all direct causal links between the variables. Causal graphs can be obtained using causal discovery methods, or can be drawn from domain knowledge. In this study, we preferred the second option as causal discovery is subject to large uncertainties given the finite nature of datasets and the potential existence of hidden confounders22. The proposed causal graph (Fig. 1b) is a stationary directed acyclic graph, meaning that: (1) it is resolved in time, i.e. it contains lagged causal links and autodependency links; (2) considered to be stationary in time, i.e. the graph structure and the associated causal effects do not vary with time; (3) the direction of all causal links is known, with no hidden confounding variable; and (4) the graph does not present any feedback loop within the same timestep. In this work, all causal links are hypothesized to have lags l = 0 or 1 timestep (i.e. 0 or 15 min). A structural causal model (SCM) is implicitly associated with the causal graph. The SCM is a set of equations that describes the causal relationships between the n variables.

    $$\begin{array}{c}{X}_{j}(t):= {f}_{j}(pa({X}_{j}(t)),{U}_{j}(t))\\ {{{{\rm{if}}}}\,{{{\rm{linear}}}}}\,{=}\mathop{\sum}\limits_{\begin{array}{c}{X}_{i}(t-{l}_{ij})\atop \in pa({X}_{j}(t))\end{array}}{\alpha }_{{X}_{i},{X}_{j},{l}_{ij}}\times {X}_{i}(t-{l}_{ij})+{U}_{j}(t)\,{{{\rm{for}\, {\rm{j}}}}}\,\in [1:n]\end{array}$$

    where fj is a linear or non-linear function that determines the value of the effect-variable Xj(t) based on the values of its direct causal parents \(pa({X}_{j}(t))={({X}_{i}(t-{l}_{ij}))}_{i}\), i.e. those variables with arrows (lag lij) pointing directly towards Xj(t) in the causal graph. Uj’s are jointly independently distributed noise variables. In the linear case, \({\alpha }_{{X}_{i},{X}_{j},{l}_{ij}}\) is the coefficient of the SCM for the direct causal effect of the parent-variable Xi on variable Xj at lag lij. In this study, the SCM is assumed to be linear as a first order approximation, and the \({\alpha }_{{X}_{i},{X}_{j},{l}_{ij}}\) coefficients of the SCM, also called path coefficients or direct causal effects, correspond to the weights on each single arrow of the causal graph. They are the target metric of the causal effect estimation performed in this study.

  2. 2.

    Observational data for all the variables in the causal graph. Timeseries are ideal as the precedence of cause on effect can be exploited. As explained in the introduction, the 15 min temporal resolution of geostationary data is reasonable for stratocumulus clouds as it is close to the expected process timescale at which microphysical changes occur and are propagated throughout the cloud.

  3. 3.

    An estimation method for causal effect quantification given a causal graph and its associated data. There are two methods: the adjustment approach20,57 and Wright’s path approach40,41. The adjustment method is a non-parametric approach that allows to treat graphs with hidden variables. In the linear adjustment method, the total causal effect of X on Y is:

    $$\begin{array}{c}{\beta }_{X,Y}=\frac{\partial {\mathbb{E}}(Y| do(X=x))}{\partial x},\\ {{{\rm{where}}}}\,{\mathbb{E}}(Y| do(X=x))={{\mathbb{E}}}_{Z}\left[{{\mathbb{E}}}_{Y| X,Z}[(Y| X=x,Z=z)]\right]\end{array}$$

    The formulation of βX,Y can be extended to non-linear cases, by using a non-linear estimator (e.g. a neural network). The do-operator signals that we are calculating causal effects (not correlations) from the observational distribution. This is done by using a set of adjustment variables Z. Z is determined from the causal graph and contains the variables that block all non-causal paths from X to Y, thereby removing any confounding. Importantly, Z does not contain any descendants of Y, or any mediators, thereby avoiding introducing collider bias. In the linear case, βX,Y simply corresponds to the partial linear regression slope for X in the regression of Y with respect to both X and Z. The Wright method only applies to the linear case and generally cannot handle hidden confounding. It is primarily concerned with the estimation of the direct causal effects \({\alpha }_{{X}_{i},{X}_{j},{l}_{ij}}\), i.e. the arrow coefficients in the causal graph. They are calculated as the partial regression slopes in the multiple linear regression of Xj on its causal parents pa(Xj(t)), thereby removing any source of confounding that is implied by the causal graph. The Wright method differs from the adjustment method in its computation of total effects, as the total effect of X on Y (l-lagged) is derived from pre-computed direct effects using the path tracing formula:

    $$\beta_{X,Y,l} = \mathop{\sum}\limits_{\begin{array}{c}\scriptstyle{\mathrm{causal}}\,{{\mathrm{paths}}}\\ {{{\mathrm{from}}}\,X(t-l)}\atop{{\mathrm{to}}\,Y(t)}\end{array}}\left(\mathop{\prod}\limits_{{X_i} {\mathop\rightarrow\limits^{l_{ij}}} X_j{\rm{arrow}}\atop{\mathrm{in}\,{{\mathrm{path}}}}} \alpha_{X_i,X_j,l_{ij}}\right)$$
    (2)

    See Supplementary Fig. 4b for an illustration of the path tracing formula. The path tracing formula can be applied to derive contemporaneous total causal effects or lagged total causal effects, i.e. the temporal development of causal effects. It should be noted that the temporal developments of causal effects are not re-calculated from the data at each timestep. Instead, the data are used once in combination with the graph to calculate the direct causal effects (including autodependency coefficients), and then, these direct effects are propagated in time through the graph using the path tracing formula. At the timescales at which direct causal effects are computed (lags l = 0 or 1, i.e. 0 or 15 min), there is not much advection of the cloud fields, so we can consider the cloud system to be stationary. Assuming stationarity of the causal effects, we can extend temporal development calculations past 12 h even though the satellite only provides daytime data. 24-h developments of cloud processes should therefore be understood as hypothetical cloud developments should clouds persist for so long in the study region, and they should not be understood as a direct measurement of cloud lifetime. All results of this study were derived using the Wright method, as it allows for an easier decomposition of direct and mediated effects, and the variance of this method is asymptotically smaller than the adjustment method77.

Causal effect quantification analyses were all carried out with the Tigramite package in Python (https://github.com/jakobrunge/tigramite).

Confidence intervals, masking

The first step of the analysis is to compute the direct causal effects between the variables using Wright’s estimator, i.e. we calculate the \({\alpha }_{{X}_{i},{X}_{j},{l}_{ij}}\) coefficients of the linear SCM. Confidence intervals for direct causal effect estimates were calculated using a bootstrapping method with 500 members. Direct causal effects are considered significant when the bootstrap confidence interval does not include 0. Once direct causal effects have been estimated, total causal effects βX,Y,l between two variables X(t − l) and Y(t) can be estimated using Wright’s path tracing formula (Eq. (2)). The confidence intervals for the temporal development plots were calculated by bootstrapping with n = 100 (the number of bootstrap ensemble members was scaled down due to the number of calculations).

We evaluate the regime-dependence of causal effects using binary masks, i.e. by binning the data points by low/high value of the masking variable (by comparison to the median value). Using the lower and upper fiftieth percentiles (instead of quartiles for instance) allows to have enough consecutive timesteps to carry out the causal effect calculations. Specifically, LTS, RHFT, Nd are used as masking variables for the boundary layer stability, free tropospheric humidity and aerosol background regimes.