Aerosol effects on clouds are concealed by natural cloud heterogeneity and satellite retrieval errors

One major source of uncertainty in the cloud-mediated aerosol forcing arises from the magnitude of the cloud liquid water path (LWP) adjustment to aerosol-cloud interactions, which is poorly constrained by observations. Many of the recent satellite-based studies have observed a decreasing LWP as a function of cloud droplet number concentration (CDNC) as the dominating behavior. Estimating the LWP response to the CDNC changes is a complex task since various confounding factors need to be isolated. However, an important aspect has not been sufficiently considered: the propagation of natural spatial variability and errors in satellite retrievals of cloud optical depth and cloud effective radius to estimates of CDNC and LWP. Here we use satellite and simulated measurements to demonstrate that, because of this propagation, even a positive LWP adjustment is likely to be misinterpreted as negative. This biasing effect therefore leads to an underestimate of the aerosol-cloud-climate cooling and must be properly considered in future studies.

We acknowledge the use of imagery from the NASA Worldview application (https://worldview.earthdata.nasa.gov), part of the NASA Earth Observing System Data and Information System (EOSDIS).
Even if you decreased domain size further, I am not confident you can exclude meteorological variability fully in this manner, because the considered pixel will evolved under different meteorological conditions. I would argue that the absence of changes between the different panels of Fig. 1 is due to the fact that the compounding meteorological variability is still there.
2. It is quite striking how retrieval errors in COT and REF can propagate through to the diagnosed LWP adjustment and even change its sign. But what about the step in between? Even if you account for retrieval uncertainties of the actual retrievals, there is still the diagnosis of Nd and LWP. Here you want to assess the robustness of satellite-based estimates of the climatological Nd-LWP relationship and how it can be improved. Equation 1 for Nd contains 3 parameters which we know to be based on imperfect assumptions (especially a). How robust is the Nd-LWP relationship to variations in these parameters and isn't this potentially the larger lever? I am not suggesting that you have to necessarily do this analysis. However, if you wish not to do it, I would like to ask you to discuss different sources of uncertainty in these relationships and which ones are most likely to propagate through to LWP susceptibility.
3. My final remark regards the issue of spatial heterogeneity and the impact of filtering and relates to your statements in data selection L287-L291. My understanding is that the selection criteria imposed on the CDNC retrieval can really shrink your dataset, but of course enhance retrieval certainty in the remaining data. If you look at the points remaining in your e.g. 5°x5° region following the criteria selection: What is the average percentage of points you are left with as compared to all cloudy pixels where COD and REF are retrieved and thus how well do you capture spatial heterogeneity (what you refer here to as natural variability) in the remaining quality-controlled dataset? What I am trying to get at is if the sampling itself can skew the result by not capturing the natural variability of the system, since only a small sub-space remains where trust in the retrievals is high (though your results suggest that even that may not be the case).

Edits:
The manuscript is very well written, logically structured and clear. I only have minor suggested edits regarding figure labels.  Figure 3: There is little discussion about Figures 3 in the text. Also, could you describe why the figure uses a color bar for CCN but not CER as in Figure 2? 0. There is no explanation of extended figures 1-3, 5, and 6, but they seem to be important for the author's analyses. Could you add descriptions in the main text or extended data file?
We would like to first express our thanks to the reviewers for their especially useful and constructive comments. The point-by-point responses are below with a regular font after each reviewer's point, which in turn are in Italic font. Moreover, the main remarks / major points of each reviewer are numbered to make them easier to find, because a few times in our response we also refer to our response we gave to another reviewer's very similar comment.

This study tackles an issue that remains unsolved and nicely demonstrates how important it is to stress-test and improve our methodology in providing a robust assessment of the LWP adjustment in sub-tropical stratocumuli. I have a couple of conceptual points, which I would like the authors to clarify prior to publication.
General remarks: Response 1: We think that the meteorological variability was indeed significantly reduced by narrowing down the spatial scale, although we do agree that likely some and perhaps quite significant meteorological variability can remain within 5x5 o regions. However, this fact does not reduce the strength and significance of our main arguments and results, for the reasons we explain next. Figure 1, we show cloud fields quite similar to what the reviewer chose, but in this case from south Pacific, since this image was readily available from our previous analyses and nicely serves the purpose here as well. The lower panels show COT and CER and upper panels LWP and CDNC, calculated using this information of cloud optical thickness and cloud effective radius at 1km L2 MODIS data. Figure 2, on the other hand, shows the results when both CDNC and LWP are aggregated to 1 o x1 o using L2 1km COD and CER data, as is typically done in the literature. There is significant spatial variability in COT and CER, and thus in CDNC and LWP, even if trying to exclude any CCN variability (through Aerosol Index, which was the proxy for CCN). And as is illustrated by the arrows in Figure 2, if broadly looking at the cloud fields, it seems that LWP is mostly decreasing when CDNC is increasing. This spatial behavior naturally results from the spatial variation of COT and CER, and the reasons for that stem from different spatial scales; from the large-scale meteorology, from the smallscale cloud variability (for instance, due to the true variations in the adiabatic fraction), and, of course, from the spatial aerosol variability. Our main argument is that the potential impact from both from the small-scale cloud variability (natural cloud heterogeneity) and retrieval errors have not been sufficiently considered and has been biasing the LWP vs. CDNC interpretation in many previous studies. And as can be seen, while this biasing effect is there in the coarse resolution, it exists there in the finest resolution as well. This brings a more general consideration that large-scale meteorological variability and small-scale cloud variability manifest themselves quite similarly in this kind of LWP adjustment analysis. Therefore, when the data are gathered as has been usually done for LWP and CDNC analysis, this kind of cloud heterogeneity, if not accounted for, is biasing the results through an effect that is not related to the cloud response to aerosol change.

Below in
In light of the above comments, we have reduced the strength of the claim to have removed meteorological variability by modifying the following sentence from:   Response 2: The equation for CDNC calculation is: CDNC=alpha*COTA0.5/CERA2.5. We have introduced various levels of variability/error into COT and CER, while the equation shows that the variability in alpha (for instance because the adiabatic fraction deviates from 0.8, which is assumed in the bulk coefficient of alpha) introduces similar effect to COT but more strongly (power of 1 compared to the power of 0.5 for COT). Below is a figure of that additional sensitivity study, where COT and CER error was set to 15%, but additionally alpha had various levels of error/variability. These new results suggest that the descending branch of LWP vs. CDNC likely starts a bit later as alpha variability increases, but if the alpha variability is a random variability/error, our main finding of a biasing impact from COT and CER variability/error is still strongly present even with a 25% alpha error. We have discussed this in the revised version (see tracked changed version of the revised manuscript).

The manuscript provides an interpretation for the commonly observed negative correlation between liquid water path (LWP) and cloud droplet number concentration (CDNC). The authors argue that the anti-correlation is likely driven by the combined effect of spatial variability of clouds (which possibly violates the physical assumptions required for computing CDNC) and satellite artifacts. The authors conclude that these effects are masking the rapid adjustment of clouds associated with changes in aerosol concentration. The topic is important as the research community generally neglects biases in both satellite observations and physical assumptions when performing aerosol-cloud-interactions studies. The authors' conclusion also supports this reviewers' view on the subject, that is, the socalled "second indirect effects" are difficult, if not impossible, to observe with satellite data. This manuscript makes an important contribution by alerting the community that satellitebased assessments of aerosol-cloud interactions might be biased low, or even unphysical, especially when the sign of the LWP-CDNC slope is negative. The exercise of creating synthetic observations with random errors is simple yet quite compelling. The conclusion that errors in cloud droplet effective radius (CER) are the single most important source of uncertainty in aerosol cloud interaction (ACI) estimates (by propagating uncertainties to CDNC) is well known. In fact, it can be easily concluded from the CDNC equation. In spite of this, the study is relevant. A general disagreement I have with the authors is that while they show plausible causes for artificial LWP-CDNC correlations, 1) the analysis cannot rule out the occurrence of physical anticorrelations between LWP and CDNC,
Response 1: Our aim was to show that the spatial cloud heterogeneity (or retrieval errors of COT and CER) is causing a negative bias to the dlnLWP/dlnCDNC sensitivity, however we did not argue that there could not be any true physical anticorrelations between LWP and CDNC that would then result in negative value for LWP adjustment. And we discussed and mentioned some possible physical reasons in our manuscript. But our main message was, as the reviewer formulated it above, that "these effects are masking the rapid adjustment of clouds associated with changes in aerosol concentration". For instance, if the true dlnLWP/dlnCDNC sensitivity was -0.1, also in this case the estimated sensitivity would become more negative, if the spatial variability (or retrieval errors) is not considered in the analysis. Related to this point and to make this even clearer, we also modified in the revised manuscript the following sentence: "and they are indeed entirely plausible explanations that have likely played some role in those analyzed cases" to "and there are indeed entirely plausible physical mechanisms that could have produced the inferred relationships".

Response 2: In the revised version we discuss more about the influence of COT and CER
uncertainties. However, we would like to stress that the spatial variability of (even if errorfree) COT and CER is likely playing an equally important role in this biasing effect than possible retrieval errors, as is also illustrated by the cloud fields of the Figures R1 and R2 above. However, it is to be stressed that the errors we applied in our simulations were also likely very plausible. For instance, in Grosvenor et al. 2018, the following statement was given to assess the level of error in CER: "Due to resolved and unresolved heterogeneity, an uncertainty in re of 17% was assessed in section 2.4.3 and that due to instrument uncertainty was estimated as 10% (section 2.4.7) giving an overall error of 27%."

Regarding 1), one could hypothesize that droplet collision can impact the cloud microphysics by increasing the droplet size, decreasing CDNC, and enhancing LWP. I would not be surprised if satellite retrievals are able to capture these relationships, as in situ based studies have shown that MODIS retrievals show some skill in detecting precipitation. In other words, precipitation is a key mechanism missing in the manuscript, and a possible explanation for the MODIS relationship depicted in Figure 1.
Response 3: Reviewer #3 also brought up the importance of distinguishing raining and non-raining clouds in this kind of analysis. We do agree that it is an important aspect to consider, however we argue it cannot play a significant role in our main finding, for the following reasons.
Our main finding and argument were that when CDNC increases sufficiently, LWP starts decreasing due to the biasing effect arising from COT and CER spatial variability. It follows that the behavior is at its strongest when CER values are small. On the other hand, one approach to separate raining and non-raining clouds has been to apply a CER threshold (typically of 15µ,m), so that the cases with CER less than that the threshold are non-raining clouds. This means that when using this kind of separation, the biasing effect most strongly influences in the conditions of non-raining clouds.
We made some additional analysis by separating our data set to include only cases when CER was smaller than 15µ,m. In the following figure, several cases are included. In the LHS plot all data are included, but additionally both cases of CER=30mm and CER=15µ,m are shown by black and dashed red line, respectively. In the middle panel, such a subset of all data is shown when CER was less than 15µ,m. Then, in the RHS plot even a more stringent CER=15µ,m threshold was applied as follows: if there was any case of CER>15µ,m in the entire Pacific North area, then the full day was entirely excluded. Regardless of how we separately focus on non-raining clouds, it is apparent that the biasing effect becomes most obvious actually on non-raining clouds, as explained above. In every case, there is an initial increase in LWP as CDNC increases, but as can be concluded from the figures it does follow the threshold and is artificially influenced by the strict limit in CER (as illustrated by the diagonal black and red-dashed lines). Figure R5. In the left-hand-side plot all data are included with the effective radius of CER=30µm and CER=15µm shown by black and dashed red line, respectively. In the middle panel, subset with CER less than 15µm are included. In the right-hand-side plot a more stringent CER=15µm threshold was applied as follows: if there was any case of CER larger than 15µm in the entire Pacific North area in a given day, then the full day was entirely excluded.
2) Do we know the magnitude of satellite errors? At least, MODIS CDNC compare surprisingly well with in-situ observations in subtropical clouds. It would be interesting to make figure 1) for two regions where uncertainties in satellite data are expected to be dissimilar (e.g. homogeneous stratocumulus clouds vs shallow convective clouds and/or vs broken clouds).

Response 4: Pixel-level uncertainties of MODIS COT and CER are available for 1km L2
product, and we have now further utilized them (see below). However, we want to first emphasize that the meaning and interpretation of these uncertainties becomes unclear with aggregated data at 1x1° resolution. Secondly, we also want to emphasize is that we do not assume that the level of COT and CER retrieval uncertainty should be significant enough to explain the biasing pattern in LWP vs. CDNC that have been observed; since the natural heterogeneity of cloud fields and thus the variability in COT and CER, unrelated to aerosol amounts, most probably plays an equally important role. And the third point to mention, related to the comment above: even if the spatially or temporally averaged CDNC might compare quite well with the in-situ measurement, it does not mean that the spatial variation of these fields cannot be biasing the satellite-based dlnLWP/dlnCDNC. Indeed, it has been found that the Nd (e.g. Gryspeerdt et al, 2022) and LWP (e.g. Seethala and Horvath, 2010) retrievals appear reasonable on average, which suggests that errors in the formulation of the equations are likely not particularly noticeable, but when correlated errors when LWP vs. CDNC are analyzed, they become important as we have demonstrated in our paper.
In the following figure we show the number of 1x1° aggregated pixels in two cases: "all data" in the upper panel and cases when only 1km MODIS cloud retrievals of COD and CER are aggregated if the uncertainty was less than 10% (middle panel). Lower panel in turn shows the difference in the data amount (N_original -N_filtered). Overall and quite understandably the COT and CER uncertainties decrease when their absolute values increase, thus the number of "less uncertain" pixels reduce towards land where CER values are smaller. Figure R6 is wide enough to give an idea about impact of uncertainty in the data sampling. And more importantly, in the following Figure R7, we see a very similar LWP vs. CDNC patterns for all sub-regions, where the overall uncertainties in CER and COT (and thus the amount of data with better accuracy) can be somewhat different. This suggests that the biasing effect remains, even when attempt was made to include only cases of COT and CER with improved retrieval accuracy; supporting the assumption that the biasing effect is more due to the natural cloud heterogeneity than retrieval uncertainties themselves (although both would produce a similar effect)

We think that a comprehensive analysis by defining different cloud regimes, for instance based on COT and cloud top height (like in Oreopoulos et al. 2014, for instance) and then comparing typical uncertainties of COT and CER would need a separate study. Our region in
We also selected a two-month period of MODIS data from South Pacific Painemal region, so that it was far enough from the coast (thus far from strong gradients in CCN) and where often extensive and uniform cloud fields are formed. This selection was based on first spending many hours just by looking at cloud fields from NASA Worldview. Then two-month period was selected, corresponding COT and CER data gathered, Gryspeerdt et al. 2019 filtering applied and additionally restricting by pixel-level uncertainties to have a subset where COT and CER uncertainty was always below threshold of 7.5% uncertainty. In the Supplementary material one very typical case is included with short discussion in revised manuscript. And indeed, there was surprisingly little variability from day-to-day, regarding the LWP vs. CDNC patterns between the "full data" and the "least uncertain data". What one can see, and what was clear and obvious when going through the entire data set, is that the negative pattern of LWP vs. CDNC is only slightly reduced when only the least uncertain data are included. It is difficult to fully quantify how much these patterns are due to the COT and CER uncertainties and how much due to the "natural cloud heterogeneity", but this was an attempt to demonstrate that certainly both play significant roles. Figure R6. Number of 1x1 o pixels in Pacific North without applying any "uncertainty limit" (upper panel). Number of data when only those pixels are included when both CER and COT uncertainties were less than 10% (middle panel). Difference in the amount of data with and without uncertainty limit (lower panel).

Figure R7. LWP vs. CDNC over different regions (then also over regions of different overall uncertainties in COT and CER, as illustrated in R6). Solid black line shows the median of all the data and is thus same in all three panels.
Other comments Line 31-33: Instead of "suggested", it would be more accurate to say that modeling evidence shows that entrainment can play a role Changed as suggested by the reviewer.
Lin 67: "MODIS retrievals" instead of "MODIS data" In our opinion word "data" suits better in this context.

Changed as suggested by the reviewer.
Line 75: remove "long time" as the concept is relative Changed as suggested by the reviewer.
Line 116: Add commas before and after: "strictly speaking" Changed as suggested by the reviewer.
Line 168: I would argue that progress should be made in characterizing uncertainties in CDNC.
Changed as follows: "This suggests that the future progress should focus on characterizing uncertainties and improving the accuracy of satellite-based CDNC estimations; ..."