Strong cloud–circulation coupling explains weak trade cumulus feedback

Shallow cumulus clouds in the trade-wind regions cool the planet by reflecting solar radiation. The response of trade cumulus clouds to climate change is a key uncertainty in climate projections1–4. Trade cumulus feedbacks in climate models are governed by changes in cloud fraction near cloud base5,6, with high-climate-sensitivity models suggesting a strong decrease in cloud-base cloudiness owing to increased lower-tropospheric mixing5–7. Here we show that new observations from the EUREC4A (Elucidating the role of cloud-circulation coupling in climate) field campaign8,9 refute this mixing-desiccation hypothesis. We find the dynamical increase of cloudiness through mixing to overwhelm the thermodynamic control through humidity. Because mesoscale motions and the entrainment rate contribute equally to variability in mixing but have opposing effects on humidity, mixing does not desiccate clouds. The magnitude, variability and coupling of mixing and cloudiness differ markedly among climate models and with the EUREC4A observations. Models with large trade cumulus feedbacks tend to exaggerate the dependence of cloudiness on relative humidity as opposed to mixing and also exaggerate variability in cloudiness. Our observational analyses render models with large positive feedbacks implausible and both support and explain at the process scale a weak trade cumulus feedback. Our findings thus refute an important line of evidence for a high climate sensitivity10,11.

Shallow cumulus clouds in the trade-wind regions cool the planet by reflecting solar radiation. The response of trade cumulus clouds to climate change is a key uncertainty in climate projections [1][2][3][4] . Trade cumulus feedbacks in climate models are governed by changes in cloud fraction near cloud base 5,6 , with high-climate-sensitivity models suggesting a strong decrease in cloud-base cloudiness owing to increased lowertropospheric mixing [5][6][7] . Here we show that new observations from the EUREC 4 A (Elucidating the role of cloud-circulation coupling in climate) field campaign 8,9 refute this mixing-desiccation hypothesis. We find the dynamical increase of cloudiness through mixing to overwhelm the thermodynamic control through humidity. Because mesoscale motions and the entrainment rate contribute equally to variability in mixing but have opposing effects on humidity, mixing does not desiccate clouds. The magnitude, variability and coupling of mixing and cloudiness differ markedly among climate models and with the EUREC 4 A observations. Models with large trade cumulus feedbacks tend to exaggerate the dependence of cloudiness on relative humidity as opposed to mixing and also exaggerate variability in cloudiness. Our observational analyses render models with large positive feedbacks implausible and both support and explain at the process scale a weak trade cumulus feedback. Our findings thus refute an important line of evidence for a high climate sensitivity 10,11 .
Earth's climate strongly depends on the abundance and behaviour of its smallest clouds. Shallow trade-wind cumulus clouds are rooted in the turbulent sub-cloud layer and form when thermals rise above the lifting condensation level 12 . They may grow only a few hundred metres high in dry environments or become positively buoyant and rise up to the trade-wind inversion, where they detrain condensate into stratiform cloud layers. Trade cumuli populate most of the subtropical oceans and cool the planet by reflecting the incoming solar radiation. Owing to their large geographical extent, small errors in predicting the way trade cumuli respond to warming can have a large effect on the global radiative budget. This explains why shallow cumuli in the trades are a main source of spread in the estimates of climate sensitivity of climate models [1][2][3][4] .
Cloudiness near the base of the cumulus layer makes up two-thirds of the total cloud cover in the trades 13 and its change with warming governs the strength of the trade cumulus cloud feedback in climate models 5,6 . Reductions in cloud-base cloudiness in climate models are tightly coupled with increases in lower-tropospheric mixing owing to convective and large-scale circulations [5][6][7] . On the basis of this strong negative coupling between mixing and cloudiness, the hypothesis emerged that enhanced convective mixing might desiccate the lower cloud layer and reduce cloudiness in the trades 7 . This mixing-desiccation hypothesis suggests that the moisture transported by convection from the sub-cloud layer to the trade inversion is compensated by downward mixing of drier air and evaporation of clouds near cloud base.
The mechanism-which is expected to become more pronounced with warming owing to the nonlinear Clausius-Clapeyron relationship-is consistent with idealized high-resolution simulations of nonprecipitating trade cumuli 14 and with the behaviour of climate models that have a strongly positive trade cumulus feedback 5,7,15 . However, the mixing-desiccation hypothesis has never been tested with observations. Using the convective mass flux at cloud base, M, as a proxy for lower-tropospheric convective mixing, the hypothesis can be tested by analysing the relationship between M and the mean relative humidity (R) and cloud fraction (C) at cloud base in observations, with C M ∝ ∝ β R and β < 0 suggesting the mixing-desiccation mechanism to be present in nature (Fig. 1a).
The mixing-desiccation mechanism is based on several assumptions that might not be operating in nature. M is commonly defined as the product of the cloud fraction and the in-cloud vertical velocity, and its variability is mostly governed by the area coverage of active clouds 16,17 , defined as saturated and buoyant updrafts that ventilate the sub-cloud layer. If variability in the in-cloud vertical velocity near cloud base is small, a positive relationship between C and M is expected (β > 0; Fig. 1b). This was demonstrated for nonprecipitating trade cumuli using Doppler radar data 17,18 and seems at odds with the mixingdesiccation hypothesis. Yet active clouds represent only half of the total C (refs. 19,20 ) and the lifetime and variability of passive clouds, such as the detritus of decaying clouds, might be more sensitive to R and mixing-induced drying of their environment than to M. The sub-cloud-layer mass budget provides a theoretical basis for interpreting the mixing-desiccation mechanism. It can be expressed as a budget of the sub-cloud-layer height h, h in which the entrainment rate, E, representing the mass source owing to the entrainment of dry and warm cloud layer air, and the mesoscale vertical velocity, W, are balanced by the mass export owing to the convective mass flux, M (ref. 20 ). Note that we define M as the (mass) specific mass flux, which has units of velocity (see Methods). E is the only term directly affecting the sub-cloud-layer moisture and heat budgets 21,22 .
If an increase in M is mostly balanced by an increase in E, a drying and warming of the sub-cloud layer and a reduction in R and C is expected (Fig. 1a). The trades, however, exhibit strong mesoscale convective organization, which is linked to the presence of mesoscale circulations and substantial variability in W (refs. 20,[23][24][25]. This variability in W could contribute to variability in M without directly affecting R (Fig. 1b). An increase in M could also produce increased inversion cloudiness and thus increased total cloud cover, compensating the radiative effects of a potential decrease in C. The diversity of cloud types and the large variability in W in the trades thus call into question the mixing-desiccation mechanism as the dominant control of C and trade cumulus feedbacks. The EUREC 4 A field campaign was conceived to test the mixingdesiccation hypothesis 8,9 . EUREC 4 A took place in January and February 2020 near Barbados, a region selected as a source of data because clouds in its vicinity are representative for the entire trade-wind belt 26 . During EUREC 4 A, we made measurements designed to quantify the magnitude and (co-)variability of M, C and R over one month, which was characterized by substantial variability in the mesoscale convective organization 27 and the large-scale circulation 9 (see Methods). With the help of these measurements, we are able to test the mixing-desiccation hypothesis with observations for the first time.

Observations of M, C and R co-variations
During EUREC 4 A, we dropped more than 800 dropsondes from the HALO aircraft flying at about 10 km altitude along 1-h circles of 220 km diameter 28,29 . We use the dropsonde data to estimate M at the sub-cloud-layer top as a residual of the mass budget (equation (1)) on the 3-h scale of three consecutive circles (see Methods). Figure 2a shows a large day-to-day variability of M, with higher values at the beginning and end of the campaign, and a campaign mean of 17.4 ± 7.5 mm s −1 (mean ± standard deviation σ). M shows a pronounced diurnal cycle (Extended Data Fig. 1), with larger values around sunrise and smaller values in the afternoon (consistent with refs. 20,30 ). The mass budget estimates are robust to changes in the estimation procedure and consistent with independent data (Methods and Extended Data Fig. 2).
The entrainment rate E is computed as the ratio of the scaled surface buoyancy flux and the buoyancy jump across h (equation (2) and Extended Data Fig. 3). E averages to 18.3 ± 6.4 mm s −1 across the campaign (Fig. 2b) and also shows a pronounced diurnal variability (Extended Data Fig. 1). E is mostly controlled by variability in the surface buoyancy flux (Extended Data Fig. 4b). It is the strengthening of winds and surface fluxes that contributes most to the increase in E and M in the second half of EUREC 4 A. W is, with −0.9 ± 6.7 mm s −1 , on average nearly zero. Variability in W, however, is substantial and contributes slightly more to variability in M compared with E (Extended Data Fig. 4a). So although M ≈ E holds on average, consistent with the mixing-desiccation hypothesis (Fig. 1a), variability in M is controlled by both E and W. Figure 2c shows the new measurements of the cloud-base cloud fraction C from combined horizontally staring lidar and radar on board the ATR aircraft flying near cloud base 31 . C is, with 5.4 ± 3.1%, both small and highly variable. The variability of C on the 3-h scale is substantially larger than variability on synoptic and longer timescales 13 . The robustness of C is demonstrated by the internal consistency among complementary and independent measurements in terms of measurement techniques and spatial sampling 31 . The R at cloud base is robustly around 86% (Fig. 2d), except for a few outliers. Three data points with much lower R for ATR compared with HALO (marked with 'X' in Fig. 2d) are excluded in the following analyses, as these situations were associated with air masses that were sampled differently by the two aircraft (see Methods and Fig. A2 in ref. 31 Fig. 1 | Illustration of two mechanisms for the coupling of mixing and cloudiness. a, The mixing-desiccation mechanism contends that E increases in response to an increase in M, which leads to a reduction in R and cloud-base cloudiness C, and a relationship R C M ∝ ∝ β with β < 0. b, The mesoscale motion control of cloudiness instead suggests that M is equally controlled by both E and W, such that M is uncorrelated to R and β > 0.

Article
Despite being fundamental quantities to understand climate sensitivity, the challenging nature of observing M and C so far prevented an observational analysis of the relationship between mixing and cloud-base cloudiness. With the EUREC 4 A observations presented here, we are now able to test the mixing-desiccation hypothesis with data.

Data refute mixing-desiccation hypothesis
The cloud-base cloud fraction is suggested to be controlled both dynamically through M and thermodynamically through R. We can therefore express C as a multiple linear regression C a a M â = + + The mixing-desiccation mechanism contends that, as M increases, E increases and leads to a reduction in R.  22 , as it transports mass with the same properties of the well-mixed sub-cloud layer. The positive correlation between W and R is thus probably connected to a self-aggregation feedback leading to a net convergence of moisture into areas that are already moist 25,32,33 . The opposing correlations of E and W with R lead to a negligible anticorrelation of M and R (r = −0.08 [−0.26, 0.10]; Fig. 3b). Although this makes M and R independent predictors of C, it contrasts with the expected desiccation effect of increased mixing. The basic premise of the mixing-desiccation hypothesis thus breaks down in the presence of strong variability in W. Figure 3c further shows a pronounced positive correlation between C and M (r = 0.72 [0.64, 0.81]), demonstrating that M explains more than 50% of variability in C. The EUREC 4 A data are therefore in line with a more direct relation C ∝ M β and a β > 0 (Fig. 1b). The tight connection between C and M is also consistent with physical understanding represented in the scaling * C C M w ≈ 2 ∝ 2 / core , in which C core is the area fraction of active cloud cores and w* is the Deardorff vertical velocity scale (see Methods and ref. 24 ). The correlation of C with R is weaker (r = 0.36 [0.16, 0.56]; Fig. 3d). These conclusions are robust to changes in the estimation procedure of M and to independent estimates of C (Extended Data Fig. 5). represents C. The shading in c represents the scaling for C ∝ 2M/w* using the mean ± 2σ of the velocity scale w*. The grey 'X' markers represent data that are excluded in the correlations owing to inconsistent sampling between the two aircraft (see Fig. 2d and Methods). The relationships exposed by the EUREC 4 A data are thus in opposition to the mixing-desiccation hypothesis, which contends that increasing mixing (larger M) leads to a desiccation of the lower cloud layer (smaller R) and a reduction in cloud-base cloudiness (smaller C). We also find a positive relationship between C and another indicator of lower-tropospheric mixing (Extended Data Fig. 4f) and a weak positive correlation between M and the total projected cloud cover (Extended Data Fig. 6). Hence, the EUREC 4 A data emphasizes dynamic factors-the convective mass flux M and the mesoscale vertical velocity W-as dominant controls of C, rather than thermodynamic factors related to the mixing-desiccation mechanism.

Models underestimate strong cloud-circulation coupling
How consistent is the present generation of climate models with our observations? To assess how climate models represent the relationship between mixing and cloudiness, we use ten models from the Cloud Feedback Model Intercomparison Project (CFMIP) 34 that provide the necessary pointwise M, C and R output at high temporal resolution near the EUREC 4 A domain (see Methods). In contrast to the consistency among many independent EUREC 4 A observations, Fig. 4a shows that the models strongly differ in their magnitude and variability of M and C. Although some models predict unrealistically low M (CanAM4, MIROC6 and MPI-ESM), the IPSL-CM6A has a five times larger mean M compared with the EUREC 4 A observations. Except for IPSL-CM6A, all models strongly overestimate variability in C (see also ref. 35 ) and 8 of 10 models also overestimate the magnitude of C. This is partly owing to the tendency of models to produce stratocumulus clouds in this shallow cumulus regime 36,37 (evident in the strong increases in C (up to 50-100%) above a critical R of about 94%; see Extended Data Fig. 7). By contrast, the observations indicate no occurrence of C > 13% or R > 94%. The models that produce such more stratocumulus-like conditions with R > 94% more than 15% of the time (Extended Data Fig. 8a) are labelled with open symbols in Fig. 4.
Only the BCC-CSM2 model represents the pronounced positive correlation between C and M observed during EUREC 4 A at the 3-h scale (Fig. 4b). Six of the other models have a correlation coefficient r < 0.05, of which three models even show a negative correlation. Most models thus strongly underestimate the tight coupling between clouds and convection observed in EUREC 4 A. Instead, these six models are more in line with the mixing-desiccation mechanism and a β < 0 (Fig. 1a), even though this is not mediated by a pronounced negative correlation between M and R (Extended Data Fig. 8c). All the models also strongly underestimate variability in W (Extended Data Fig. 8b), as they do not represent the sub-grid processes leading to the observed variability in the mesoscale vertical velocity (for example, shallow circulations driven by differential radiative cooling 38 or local sea-surface temperature (SST) gradients 39 ). The relationships between C and R are more consistent among most models (Fig. 4b) and are also more consistent with the observations compared with the relationships between C and M.
In contrast with the observations, clouds as parameterized by climate models are more thermodynamically than dynamically controlled. The misrepresentation of the relative sensitivity of C to changes in M or R by all models is encapsulated in the ratio of the standardized regression coefficients All models, with one exception, substantially underestimate the value of a a / M R compared with the observations, highlighting that, in the climate models, variability in C is primarily controlled by variations in R rather than variations in M. Although BCC-CSM2 seems credible in terms of the magnitude and relationship of C and M, its credibility is eroded by its unrealistic relationship between C and R (Extended Data Fig. 7), and thus an implausible a a / M R of −5.2. At odds with the observations, in most models, M and R are only weak predictors of C, as evident in the low coefficient of determination (r 2 ) of the multiple linear regression of Ĉ (Extended Data Fig. 8c). The cloud parameterizations of the models thus fail in capturing the key relationships between C and the dynamic and thermodynamic environment observed in nature.

Implications for trade cumulus feedbacks
The EUREC 4 A observations provide robust estimates of the mean, variability and coupling of M, C and R in contrasted trade cumulus environments. Although the observed variability is substantial, the variability simulated by climate models is unrealistic, as are the drivers of this variability. The EUREC 4 A data thus provide a physical test of the capacity of models to represent the interplay of the processes active in regulating trade-wind cloud amount and may guide future model development. Moreover, the fact that the relationships at the 3-h process scale are consistent with the relationships at the monthly timescale (r ≥ 0.84; Extended Data Fig. 8e,f) suggests that the underlying fast physical processes that couple M, R and C in the models are largely Article invariant with the timescale. The relationships derived from the EUREC 4 A observations can therefore also be used to evaluate the credible range of trade cumulus feedbacks in the climate models. Figure 4b demonstrates that all models with a strong trade cumulus feedback represented by a change in the cloud radiative effect (ΔCRE) with warming exceeding 0.37 W m −2 K −1 (reddish colours in Fig. 4c) represent the refuted mixing-desiccation mechanism with a negative (or very weak) correlation between M and C. Also, these four models exaggerate both the coupling of C to R (small a a / M R ; Fig. 4c) and the variability in C (σ C ; Extended Data Fig. 8d). By contrast, the models that are closer to the observations tend to have a weaker positive ΔCRE with warming. The EUREC 4 A observations of the physical processes that drive the short-term variability of C thus rule out the mechanism that leads to the largest positive trade cumulus feedbacks in current climate models.
By showing that mesoscale motions inhibit the mixing-desiccation mechanism, we refute an important physical hypothesis for a large trade cumulus feedback. In the spirit of the storyline approach for constraining equilibrium climate sensitivity 10 , our findings thus refute an important line of evidence for a strong positive cloud feedback and thus a large climate sensitivity. The EUREC 4 A observations therefore support recent satellite-derived constraints from observed natural variability 37,40 and climate-change experiments using idealized highresolution simulations 41,42 , which suggest that a weak trade cumulus feedback is more plausible than a strong one. Moreover, for the first time, we take into account all types of cloud present in the trades, including the optically thinnest ones that are usually missed in satellite observations 43 , and consider the full range of mesoscale variability that was not represented in idealized simulations of cloud feedbacks. We also provide an explanation for the inconsistency of models with large positive feedbacks: in these models, the observed tight coupling between convective mixing and cloudiness is absent; instead, C is primarily controlled thermodynamically by R, which exaggerates variability in C and feedbacks to warming. By not representing the variability in mesoscale circulations, the models miss an important process regulating trade cumulus clouds. Future research should focus on better understanding the processes controlling these mesoscale circulations and how they might change in a warmer climate.

Online content
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41586-022-05364-y.

EUREC 4 A field campaign
We use data from the EUREC 4 A field campaign, which took place in January and February 2020 and was anchored in Barbados 8,9 . We focus on measurements made by the HALO 29 000 km), and is associated with the emergence of the prominent trade cumulus cloud organization patterns 47 . As the air masses are advected by about 30 km per hour (at the campaign mean wind speed of about 9 m s −1 at 1 km height), the spatial sampling of the 220-km diameter circle does not differ substantially between the 1-h and 3-h timescales, which motivates our nomenclature focus on the time rather than space scale. Using the measurements, model and reanalysis data, we would not expect our results to change substantially if the analysis domain were increased or reduced by a factor of two or more (see Methods section 'Mass flux estimation' for a discussion of the scale sensitivity of the results).
The Barbados region was chosen as the location of EUREC 4 A because shallow trade cumulus clouds are the dominant cloud type in the area during winter 13 . Furthermore, clouds in the Barbados region are similar to clouds across the trade-wind regions in both observations and models 26 . The mean meteorological conditions during the EUREC 4 A campaign, as sampled by the dropsondes, also correspond well to the average January-February conditions from 12 years of data from the ERA-Interim reanalysis 48 (their Fig. 5), albeit with a 10% larger 850 hPa relative humidity during EUREC 4 A (the EUREC 4 A dropsondes also have an approximately 8% larger relative humidity compared with the 2013-2022 average in ERA5, not shown). Also, all four prominent patterns of mesoscale cloud organization 47 were present during the campaign 27 . The conclusions drawn from the EUREC 4 A data are thus relevant across the tropics and for climate timescales.

Observations
For estimating the cloud-base mass flux M, R and many other variables, we use dropsonde data from the JOANNE dataset 28 , namely Level 3 (gridded quality-checked sondes) and Level 4 (circle products) vertical profiles of thermodynamic quantities, wind and mesoscale vertical velocity, W. The HALO dropsondes are corrected for a dry bias by multiplying the relative humidity by 1.06 (ref. 28 ).
For the cloud-base cloud fraction C, we use the BASTALIAS lidar-radar synergy product 31 , which includes both cloud and drizzle (but not rain) and constitutes an upper bound on C. We also test the relationships for three further estimates of C: • The non-drizzling cloud product from the radar-lidar synergy (C only ), which excludes drizzle and constitutes a lower bound on C. • In situ estimates from a microphysical probe defined on the basis of thresholds of liquid water content plus particle size (C pma ). • In situ high-frequency (25-Hz) humidity sensor, with cloud defined as relative humidity ≥98% (C turb ). The in situ sensors measure the along-track C, whereas the lidar-radar synergy samples clouds inside the rectangle at a distance up to 8 km from the aircraft 31 . Despite pronounced differences in the measurement principles and sampling, Fig. 18 of ref. 31 demonstrates the internal consistency and robustness among the independent C estimates. The ATR turbulence measurements also include measurements of vertical updraft and downdraft velocities 49 , from which an in-cloud mass flux M turb is computed by multiplying C turb by the in-cloud vertical velocity.
Further HALO aircraft measurements used are total projected cloud cover (CC) estimates from the differential absorption lidar WALES, the hyperspectral imager specMACS and the cloud radar HAMP 29 . From these cloud masks, we derive the CC along the 1-h circle. For specMACS and HAMP, the cloud detection is ambiguous and we consider both the 'probably cloudy' and the 'most likely cloudy' flags in our CC estimates.
We also use ceilometer and cloud radar data from the BCO and the RV Meteor to test the robustness of the sub-cloud-layer height definition (not shown). Radar cloud fraction profiles are obtained by correcting the hydrometeor fraction profiles with ceilometer data during periods of rain (see ref. 30 for a description of the correction applied). The BCO cloud radar data also demonstrate that missing the level of maximum cloud-base cloud fraction in 3-h averages by, say, 60 m does not affect the variability of C (correlations of r = 0.99 and r = 0.93 with the maximum C when 60 m above and below the peak level, respectively) and only marginally affects its magnitude (18% and 33% smaller relative to the maximum C for being 60 m above or below the peak level, respectively). So only if the ATR flight level deviated from the height of maximum cloudiness in ways that co-varied with M would we expect such a height difference to influence our analysis. As the ATR aircraft usually flew slightly above h (Extended Data Fig. 3a) and because it sampled many more clouds in 3 h compared with the stationary BCO, a potential influence of missing the peak level is deemed not to bias our findings.

Surface buoyancy flux
To estimate the surface buoyancy flux (w θ ′ ′ v s | , needed to compute M), we use dropsonde humidity, temperature and wind data at 20 m height and apply the Coupled Ocean-Atmosphere Response Experiment (COARE) bulk flux algorithm version 3.6 (refs. 50,51 ). For the SST, we extrapolate the 2-m-depth SST of the RV Meteor (thermosalinograph primary backboard temperature), or alternatively from the AutoNaut Caravela 52 , to the dropsonde location based on a fixed zonal and meridional SST gradient of −0.14 K per degree. A gradient of −0.14 K per degree corresponds to the median zonal and meridional gradient (−0.145 K per degree and −0.135 K per degree, respectively) across the EUREC 4 A circle over the period from 19 January to 15 February in the ERA5 reanalysis 53 and in two satellite SST products (from the Advanced Baseline Imager on board the Geostationary Operational Environmental Satellite (GOES-16 ABI) and the Collecte Localisation Satellites (CLS).

Article
The sonde-derived surface buoyancy flux on the 3-h scale compares favourably with bulk fluxes from the RV Meteor mast, with a correlation coefficient r = 0.83 and a mean offset of 0.1% relative to RV Meteor. The sonde-derived flux has a comparable magnitude with the flux measured at the RV Ronald Brown 54 further upstream and is also well correlated (r = 0.81) with ERA5. The ERA5 fluxes, however, overestimate the surface buoyancy flux compared with the sonde-derived flux by 25%, which is mostly because of the overestimation of the sensible heat flux by 64% relative to the observations (9.8 W m −2 and 6.0 W m −2 for ERA5 and dropsondes, respectively). A strong overestimation of the sensible heat flux compared with buoy measurements in the region is also present in the predecessor ERA-Interim reanalysis 55 . Overall, the good correspondence of our sonde-derived surface buoyancy flux with the independent data lends credibility to our estimation procedure. The sonde-derived surface buoyancy flux is also used to compute the Deardorff sub-cloud-layer vertical velocity scale | ( ) Fig. 3c, in which g is the gravitational acceleration.

Mass flux estimation
Vogel et al. 20 developed a method to estimate the shallow-convective mass flux at the sub-cloud-layer top as a residual of the sub-cloud-layer mass budget and tested it in real-case large-eddy simulations over the tropical Atlantic. Here the method is applied to EUREC 4 A observations, in parallel with Albright et al. 22 , who close the sub-cloud-layer moisture and heat budgets and provide an independent constraint on the entrainment rate E. Except for the surface buoyancy flux estimate (see the previous section), all data for the budgets come entirely from the dropsondes. Equation (1) expresses the budget of the sub-cloud-layer height h per unit area and constant density. h t ∂ ∂ represents the temporal fluctuation of h and V h · ∇h its horizontal advection, E is the entrainment rate, W the mesoscale vertical velocity (positive upwards) and M the convective mass flux at h.
The sub-cloud-layer height h is defined as the height at which the virtual potential temperature (θ v ) first exceeds its density-weighted mean from 100 m up to h by a fixed threshold ϵ = 0.2 K (refs. 22,56 ). Extended Data Fig. 3a confirms that our h is usually close to the ATR flight altitude and h is also well within the range of independent BCO and RV Meteor observations of the maximum radar cloud-base cloud fraction and the peak frequency of the first ceilometer cloud-base height (not shown). This confirms that our h agrees well with the level of maximum near-base cloud fraction, which was set as the target height for the ATR flight level and thus for evaluating the mass budget 31 .
The entrainment rate E represents the deepening of h owing to small-scale mixing at the sub-cloud-layer top. We use a modified version of the classical flux-jump model 57,58 that accounts for the finite thickness of the transition layer, the approximately 150-m-thick stable layer separating the mixed layer from the cloud layer (see ref. 22 for details). The buoyancy flux at h is modelled as a fixed fraction A e of the surface buoyancy flux, | w θ ′ ′ v s , in which A e is the effective entrainment efficiency. The buoyancy jump at the sub-cloud-layer top is computed as θ θ θ q q θ Δ =Δ +0. 61 q is the specific humidity, C q and C θ are scaling coefficients accounting for uncertainty in the depth over which the jumps are computed, the subscript h+ refers to the value of q or θ above h (computed as the average from h to h + 100 m) and q and θ are averages from 50 m to the mixed-layer top (defined as the height of maximum relative humidity below 900 m). Finally, E is computed as The uncertain parameters A e , C q and C θ are estimated through a joint Bayesian inversion to close the moisture and heat budgets by ref. 22 , yielding maximum-likelihood estimates of A e = 0.43 ± 0.06 (mean ± 1σ), C q = 1.26 ± 0.34 and C θ = 1.15 ± 0.31.
The mesoscale vertical velocity W at h is computed by vertically integrating the divergence of the horizontal wind field measured by the dropsondes 23 from the surface up to h. W is at the lower end of the meso-α scale of ref. 46 , what climate modellers often associate with the 'large scale'. The terms h, E and W are computed at the 1-h scale of a single circle and then aggregated to the 3-h scale (three circles).
The temporal fluctuation of h is estimated as the linear regression slope of h computed from the 30-36 soundings available per 3-h circle set. Similarly, the horizontal advection of h is estimated as the sum of the linear regressions of the eastward (∂h/∂x) and northward (∂h/∂y) gradients of the individual h, multiplied by the wind speed at the 3-h mean h. Both ∂h/∂t and V h · ∇h are only available on the 3-h scale.
The default M shown in the paper is the equilibrium mass flux M = E + W, which reproduces well the mass flux diagnosed directly from cloud-core area fraction and vertical velocity in large-eddy simulations 20 . This equilibrium M is also available on the 1-h scale of an individual circle. Taking into account ∂h/∂t and V h · ∇h in the mass flux estimate leads to , which shows very similar characteristics compared with M (Extended Data Fig. 3). This is mainly because both the advection (−1.3 ± 2.7 mm s −1 ) and temporal fluctuation (0.5 ± 6.8 mm s −1 ) terms are on average about zero, and the advection term is also nearly invariant. The inclusion of advection and h t ∂ ∂ in M′ slightly enhances variability on the diurnal timescale (Extended Data Fig. 1a).
Cold pools formed by evaporating precipitation destroy the structure of the sub-cloud layer and make the estimation of h less robust. We thus exclude soundings that fall into cold pools in the analysis using the criterion of h < 400 m developed by ref. 56 based on the EUREC 4 A soundings. The influence of these and other assumptions on the magnitude and variability of M are discussed in the Methods section 'Robustness of observational estimates'. Also note that our M is defined as the (mass) specific mass flux and has units of velocity. It differs from the more familiar mass flux (in units of kg m −2 s −1 ) by the air density, which is usually assumed to be constant 18,59 , and which is justified here given the small density variations across the measurements (mean ± σ of 1.104 ± 0.0077 kg m −3 , that is, less than 0.7% of the mean).
Although the 1-h scale variability of M can be substantial (for example, second 3-h circle sets on 26 January and 13 February; Fig. 2), the median estimation uncertainty is only 20% at the 3-h scale (see section below). Also, M has a similar magnitude and reassuring correlation (r = 0.77) to the independent M turb estimate from in situ turbulence measurements on the ATR aircraft (Extended Data Fig. 2d).
The mass budget terms show different degrees of scale sensitivity (see also discussion in ref. 20 ). Extended Data Figs. 2c and 4a show that the correlation between W and M is slightly larger at the 1-h scale compared with the 3-h scale (r W,M 3h = 0.60 and r W,M 1h = 0.67), whereas they are essentially the same for E and M (r E,M 3h = 0.54 and r E,M 1h = 0.55). The scale sensitivity of the W variance is in line with radiosonde data from the EUREC 4 A ship array, which show that the divergence amplitudes at equivalent radii of 100-300 km scale inversely with radius 60 (as in ERA5 and ICON, consistent with ref. 23 ). In ERA5, the scale sensitivity of the surface buoyancy flux, which contributes most to variability in E (Extended Data Fig. 4b), is much smaller compared with the scale sensitivity of W (not shown). This is probably because variability in the surface buoyancy flux is mostly controlled by the surface wind speed (Extended Data Fig. 4h) and radiative cooling 61 , both of which are large scale. The surface wind speed has autocorrelation coefficients of 0.74 for a 2-day and 0.48 for an 8-day lag (Fig. 3d of ref. 22 ). Although weaker compared with the synoptic variability, the surface wind also has a distinct diurnal cycle 62,63 , which causes a diurnal cycle of the surface buoyancy flux (Extended Data Fig. 1c and ref. 20 ). Some of the diurnal variability in E is thus lost for longer temporal averaging. Also, the variability in the temporal fluctuation and horizontal advection of h (equation (1)) decreases on larger scales 20 . In summary, M variability decreases on larger averaging scales. The scale sensitivity of W is larger compared with E, such that the contribution of W to M variability tends to become smaller compared with the contribution of E on much larger scales.
As noted above, E describes the net effect of local processes and must be inferred from the statistics of other quantities (that is, the mean sub-cloud-layer growth rate or the dilution of sub-cloud-layer properties). This raises the question whether the E estimate itself might depend on the mesoscale environment and therefore introduce spurious co-variabilities between M, W and C. The Bayesian estimation of the uncertain parameter estimates A e , C q and C θ is a priori independent of M and W. Also, the synoptic variability during EUREC 4 A can be well explained by keeping them constant 22 . Reference 22 also explored to what extent other factors correlated with residuals in their Bayesian fits and found no evidence of a systematic effect of other factors, including wind speed and shear 64 . As discussed above, the variability in E tends to be less scale-sensitive than W and mostly controlled by larger-scale factors, such as the surface wind speed (through the surface buoyancy flux; Extended Data Fig. 4b,h). Furthermore, E and W are anticorrelated (r E,W = −0.35; Extended Data Fig. 4g). So both statistically from the anticorrelation and physically through the scale argument, we believe that our parameterization of E does not induce spurious co-variability.

Uncertainty estimation
For the M, R and C estimates, we distinguish two sources of uncertainty: sampling uncertainty and estimation (or retrieval) uncertainty. For all terms, the sampling uncertainty is computed at the 3-h scale as the standard error, σ n SE = / , of the three individual 1-h circle values (each representing about 50 min of flight or up to 12 sondes), in which σ is the standard deviation and n the number of circles.
The estimation uncertainty is computed differently for every term according to the underlying assumptions and choices. For R, the manufacturer-stated uncertainty (that is, repeatability) is 2% and some extra uncertainty stems from the correction of the dry bias of the HALO dropsondes (see ref. 28 ). Because this uncertainty is the same for all data points, the estimation uncertainty of R is not shown in the figures. For C, the estimation uncertainty is computed for every 3-h circle set as the SE of the four different estimates of C, namely C itself, C only , C turb and C pma . The uncertainty estimate therefore represents uncertainty in measurement principles and spatial sampling 31 . Further uncertainties of the individual C estimates (for example, owing to the choice of thresholds) are neglected, as sensitivity tests suggest that they are smaller than the uncertainty among the different C estimates 31 .
For W, the advection term V h · ∇h and the temporal fluctuation ∂h/∂t, the estimation uncertainty is taken as the SE of the respective regression used to compute the term. Because ∂h/∂t is computed from individual sondes, it contains both temporal and spatial variability of h on the 3-h scale and its SE is inflated.
The estimation uncertainty of the surface buoyancy flux is a combination of uncertainty in the underlying SSTs and in the COARE bulk flux algorithm. We estimate the uncertainty in the underlying SSTs by computing the SE of five different versions of the flux (three with different fixed SST gradients (the default median value and the median ± interquartile range, that is, −0.14 K per degree, −0.21 K per degree and −0.07 K per degree), one with a temporally varying gradient (not shown) and one with a different baseline SST (from the AutoNaut Caravela 52 instead of the RV Meteor)) and adding a 5% uncertainty of the COARE algorithm given in ref. 50 as the 1-h uncertainty in the 0-10 m s −1 wind speed range. For A e and Δθ v , we use the relative uncertainties of the Bayesian inversion as the estimation uncertainty (that is, σ(A e )/A e for A e and the average of σ(C q )/C q and σ(C θ )/C θ for Δθ v ).
Uncertainties in the three individual terms of E are propagated by adding their fractional uncertainties in quadrature to yield the estimation uncertainty of E. In the same spirit, the estimation uncertainty is propagated from the 1-h scale to the 3-h scale and from the individual terms of equation (1) to M.
The uncertainties of the correlations and the multiple linear regression are estimated with bootstrapping (10,000 repetitions). We communicate these uncertainties by mentioning the 25th and 75th quartiles in the text and by showing both the quartiles and the 2.5% and 97.5% quantiles (representing the 95% confidence interval) in Fig. 4 and Extended Data Fig. 8. Apart from the uncertainty quantification described here, we assess the robustness of the M and C observations to several other choices and assumptions in the Methods section 'Robustness of observational estimates'.

Other mixing indicators
Other proxies for lower-tropospheric mixing were used in previous studies 5,7,65 that can be estimated from the dropsonde data and compared with the variability in C. Here we compute the boundarylayer vertical advection (BVA) diagnostic from ref. 65 defined as in which MSE is the moist static energy, Z min the level of minimum MSE that marks the top of the trade-wind layer (on average at 2,900 m) and ρ the density. Note that a lower (more negative) BVA value indicates stronger mixing.
Reference 65 found a pronounced positive relationship between changes in BVA and changes in C from a series of single-column model experiments with the IPSL-CM5A model, which is characterized by a strong positive low-cloud feedback and the presence of the mixing-desiccation mechanism (Fig. 4). Extended Data Fig. 4f shows a pronounced negative correlation between BVA and M in the EUREC 4 A data, indicating good agreement in their complementary definitions of mixing. Smaller BVA (stronger mixing) is also associated with larger C (not shown), which is at odds with the IPSL-CM5A model. The absolute correlation between BVA and C (r = 0.34), however, is considerably smaller than the correlation between M and C (r = 0.72).

General circulation models
The cloud fraction, net mass flux (upward and downward) and relative humidity at cloud base are calculated for ten Coupled Model Intercomparison Project (CMIP) models: • Four from the fifth phase, CMIP5 (ref. 66  (ref. 73 ), IPSL-CM6A-LR 74 , MIROC6 (ref. 75 ), MRI-ESM2-0 (ref. 76 ), HadGEM3-GC31-LL 77 using the sub-hourly vertical profiles at selected sites (named cfSites in CMIP5 and CF subhr in CMIP6) provided by CFMIP 34 . Note that the M from the models is not computed using equation (1) but is defined according to the respective convective parameterization scheme of the models (see references above). We use the atmosphere-only amip configuration from 1979 to 2008, selecting data from December, January, February and March to be broadly consistent with the winter conditions sampled during EUREC 4 A. For each model, between two and six sites are available in the North Atlantic trades between 60-50° W and 12-16° N, namely the BOMEX, NTAS, EUREC 4 A, BCO and RICO sites. All profiles with clouds above 600 hPa (about 4.2 km) are dropped to ensure a focus on shallow convection. We verified that, in terms of the large-scale environment, the cfSites fall into the climatological trade cumulus regime as defined by ref. 40 .
The cloud-base level is defined as the level of maximum cloud fraction between 870 and 970 hPa (between about 400 and 1,300 m). If the maximum cloud fraction is smaller than 0.25% for a given profile, the cloud-base level is taken at the climatological level of maximum cloud fraction. The hourly cloud-base data are aggregated to a 3-h timescale, which corresponds to the 3-h scale of the EUREC 4 A data, as well as a monthly timescale. The values computed are insensitive to (1) averaging across the sites before aggregating to the 3-h timescale, (2) removing the site near the Northwest Tropical Atlantic Station buoy upstream of the EUREC 4 A circle (near 51° W and 15° N), (3) focusing only on January and February, and (4) excluding nighttime values outside the hours sampled during EUREC 4 A (not shown).
We use the thermodynamic component of the change in the cloud radiative effect at the top of the atmosphere (ΔCRE) with warming under given dynamical conditions to quantify the strength of the trade cumulus radiative feedback. Reference 2 showed that the ΔCRE with warming is a good approximation of the cloud feedback computed with radiative kernels 78 . The CRE is defined as the difference between all-sky (all, including clouds) and clear-sky (clr, clouds assumed to be transparent to radiation) net downward radiative fluxes, CRE = R all − R clr = (LW clr − LW all ) + (SW all − SW clr ) = CRE LW + CRE SW , with R being the total radiative flux and LW and SW its longwave and shortwave components, respectively. The radiative fluxes are defined positive downward. The ΔCRE with warming is then simply the difference in CRE between the warmer amip4K (4-K uniform increase in SST) and the amip (control) simulations, normalized by the 4-K temperature difference (that is, ΔCRE/ ΔT s = (CRE amip4K − CRE amip )/4K). To restrict the feedback estimation to the trade cumulus regime, we focus on ocean-only grid points between 35° S and 35° N and use the regime partitioning of ref. 40 with trade cumulus regimes in each simulation (amip or amip4K) defined as having a climatological annual mean estimated inversion strength smaller than 1 K and a vertical velocity at 700 hPa between 0 and 15 hPa day −1 .

Robustness of observational estimates
Applying the mass budget formulation to the EUREC 4 A dropsonde data involves several choices for definitions and thresholds. These choices are guided by constraints from independent data and from closure of the moisture and heat budgets in ref. 22 , which provides justification for the default configuration described in the Methods section 'Mass flux estimation'. Nevertheless, it is important to assess and understand the sensitivity of the mass budget estimates and the key relationships to different estimation choices.
We focus first on the influence of different definitions of the sub-cloud-layer height h and the entrainment rate E on the mean and standard deviation (σ) of M and E, the respective correlations of M with E and W, and the correlation and mean difference to the independent M turb estimate from turbulence measurements onboard the ATR aircraft (see Extended Data Fig. 2). For the h definition, we compare our default h to an alternative definition, 'h.parcel', which defines h as the level of neutral buoyancy of a surface-lifted parcel (with density-weighted θ v averaged from 30 to 80 m) plus 0.2 K θ v excess. Using 'h.parcel' leads to a 16 m shallower mean h compared with the default. The slightly shallower h decreases Δθ v (the denominator of the E formulation in equation (2)) from 0.36 K to 0.34 K, which slightly increases E and M by around 1.5 mm s −1 . Although W is unaffected by this small change in h, the resulting M has a slightly reduced correlation to the independent M turb compared with the default M (r = 0.69 versus r = 0.77). The same chain of arguments holds for increasing and decreasing the threshold ϵ in the h definition by ±0.05 K. With ϵ = 0.25 K instead of 0.2 K (case 'h.eps = 0.25'), h increases by 31 m and, through the larger Δθ v , decreases E and M by about 3.3 mm s −1 . Owing to the presence of a thin transition layer 22 , the response to ϵ ± 0.05 K is nonlinear and a reduction of ϵ to 0.15 K ('h.eps = 0.15') leads to a disproportionately smaller Δθ v and roughly 6 mm s −1 larger E and M. The 35 m shallower mean h with ϵ = 0.15 K also strongly increases σ E , which increases the correlation between E and M at the expense of a decreased correlation between the unaffected W and M (Extended Data Fig. 2c).
The next set of choices involves the entrainment rate estimate. We test the influence of the different surface buoyancy flux estimates from ERA5 and RV Meteor. As the ERA5 flux is 25% larger than the other fluxes, we scale it to have same mean as the dropsonde-derived flux (case 'sbf = ERA5.sc'). For 'sbf = ERA5.sc', the variability in E and M are substantially larger compared with the default dropsonde flux, increasing their correlation. For the case 'sbf = Meteor', the differences to the default estimates is smaller (Extended Data Fig. 2a,b) and the correlation with M turb is slightly larger than in the other configurations. The estimates are also unaffected by changing the three coefficients A e , C q and C θ estimated by Bayesian inversion in ref. 22 to close the moisture and energy budgets during EUREC 4 A when cold pool soundings (defined as having h < 400 m following ref. 56 ) are excluded ('diffEpars'). We further compare four different ways of computing Δθ v . Computing the value at h+ as averages from h to h + 50 or h + 150 m (instead of to h + 100 m) has a similar (but more linear) influence as increasing ϵ ± 0.05 K (see discussion above). Using two different heights for averaging θ v across the mixed layer (up to h in 'tvbar = h' and up to the level at which q first falls below its mean by a threshold of 0.3 g kg −1 in 'tvbar = qgrad') hardly influences the estimates.
Last, we show the influence of computing the mass budget including the cold pool soundings for two sets of surface buoyancy flux estimates, case 'withCP' for the default dropsonde-derived flux and 'withCP_sbf = ERA5.sc' for the scaled ERA5 flux. In both cases, the mean and σ of both M and E are increased when cold pools are included (matching the mean E of ref. 22 , who included cold pools). However, especially for the default surface fluxes (case 'withCP'), the correlation with M turb is strongly reduced.
Extended Data Fig. 2a,d also shows the influence of selected choices on the total mass flux M′, which includes the contribution of the temporal fluctuation and horizontal advection of h. Because these extra terms are on average nearly zero (Extended Data Fig. 3c), their inclusion does not affect M . σ M instead increases by about 1.5 mm s −1 owing to the pronounced variability in the temporal fluctuation term. As this term is not very robust, we use the more reliable equilibrium M as our best estimate. The equilibrium M is also robust at the 1-h scale of an individual circle (case '1h-scale').
Overall, Extended Data Fig. 2 makes us very confident in the robustness of our mass budget estimates because they only show a modest sensitivity to the various choices and because we can explain these sensitivities physically. Also, the independent ATR M turb estimates (Extended Data Fig. 2d) and the extra constraints on E from our complementary analyses of the moisture and heat budgets in ref. 22 (dashed lines in Extended Data Fig. 2b) lend further credibility to our default estimation choices.
Next, we focus on the sensitivity of the key relationships between M, C and R to a selected set of plausible estimation choices of M and the different C estimates from the ATR aircraft. Extended Data Fig. 5a shows that the positive correlation between M and C is notable for all parameter choices, and both the equilibrium M and total M′. Furthermore, the negligible correlation between M and R is also very robust.
Extended Data Fig. 5b further confirms that the default M also has strong correlations with the three independent estimates of C from the ATR aircraft. The same is true for the other estimation choices of M, with a small overall range of correlations of 0.52 < r M,C < 0.73. Correlations between C and R are more variable between the different C estimates and are in the range r 0.12 < < 0.63 C , R . It is not surprising that the C only estimate that neglects contributions from drizzle has the strongest correlation with R, as it mostly features passive clouds that are more affected by ambient humidity than the more active clouds that also include drizzle. Note that there is also a slight dependency of r(R,C) on the M estimates, as the cases 'h.parcel' and 'h.eps = 0.25' result in different h and thus different heights at which R is evaluated.
The bottom panels of Extended Data  Fig. 5c) and the different C estimates (Extended Data Fig. 5d). There is no configuration with a a / <1 M R , indicating that C is always more strongly coupled to M than to R in the observations. Slightly larger values of Also, the standard deviation of C (σ C ) is very similar for the different C estimates that include drizzle (between 2.1% and 3.7%, with 3.1% being the σ C of the default BASTALIAS lidar-radar synergy product) and only slightly lower for the C only estimate (1.6%) when using the full sample. Variability is slightly reduced in the smaller sample that overlaps with the HALO flights, because it excludes two night flights with larger cloudiness and two flights in dry environments with very small cloudiness (σ C of 1.7-2.4% for the C estimates that include drizzle).
Overall, Extended Data Fig. 5 demonstrates the insensitivity of the observed relationships to a wide range of configurations. We therefore conclude that the relationships between mixing and cloudiness observed during EUREC 4 A are very robust.