Soil moisture signature in global weather balloon soundings

The land surface influences the atmospheric boundary layer (ABL) through its impacts on the partitioning of available energy into evaporation and warming. Previous research on understanding this complex link focused mainly on site-scale flux observations, gridded satellite observations, climate modeling, and machine-learning experiments. Observational evidence of land surface conditions, among which soil moisture, impacting ABL properties at intermediate landscape scales is lacking. Here, we use a combination of global weather balloon soundings, satellite-observed soil moisture, and a coupled land-atmosphere model to infer the soil moisture impact on the ABL. The inferred relationship between soil moisture and surface flux partitioning reflects distinctive energy- and water-limited regimes, even at the landscape scale. We find significantly different behavior between those two regimes, associating dry conditions with on average warmer (≈3 K), higher (≈400 m) and drier (≈1 kPa) afternoon ABLs than wet conditions. This evidence of land–atmosphere coupling from globally distributed atmospheric measurements highlights the need for an accurate representation of land–atmosphere coupling into climate models and their climate change projections.


INTRODUCTION
The diurnal evolution of the ABL, the well-mixed layer between the land surface and free troposphere, plays a key role in weather conditions and air quality at the Earth's surface. In particular, it can influence the magnitude of temperature and precipitation extremes [1][2][3][4][5] and various processes, such as cloud formation [6][7][8] , air pollution 9 , diurnal CO 2 dynamics 10 , ecosystem carbon exchange 11 , soil respiration 12 , the persistence of urban heat islands 13 , and even dune formation 14 . ABL dynamics are sensitive to heat and moisture inputs from the land surface, which are directly regulated by soil moisture availability and its impact on the partitioning of surface energy fluxes 15 , and by incoming solar radiation, which dictates the amount of energy available for partitioning at the land surface. Next to that, the ABL is (in)directly influenced by vegetation, surface albedo, and surface roughness, which are in turn linked to soil moisture [16][17][18] . By redistributing heat and moisture vertically, ABL dynamics determine how fluxes of water and energy from the land surface combine with freetropospheric conditions to translate into near-surface temperature and humidity.
Per example, Fig. 1 depicts the typical diurnal ABL evolution from weather balloon soundings under convective conditions and weak synoptic flows over different land surface conditions at a site in Lincoln, IL, USA. The ABL is influenced by the bottom (land surface) and top (free troposphere) boundaries. At the land surface, available net radiation partitions into sensible (H) and latent heat flux (LE), transferring heat and moisture into the ABL. Sensible heat warms the air above the land surface, thus creating warm and buoyant, rising air parcels. At the top of the ABL, the temperature inversion serves as a lid, preventing the air to rise higher. However, sensible heat-driven air parcels can overshoot the inversion and entrain warm and dry air from the free troposphere, thereby deepening, warming and drying the ABL during daytime 19 . The ABL is generally less well-mixed in terms of specific humidity than potential temperature, owing to the entrainment of dry air 20 . Due to vertical convective mixing of air from the land surface to the top of the ABL, the vertical temperature and humidity profiles integrate surface heterogeneity and free tropospheric conditions over distances tens of times the ABL height. We refer to this as the landscape scale throughout this study. In this context, soil moisture leaves its signature in the ABL by regulating the partitioning of energy fluxes at the land surface: Over dry soils, evaporation is water-limited, partitioning more energy into sensible heat than over wet soils, and consequently causing the ABL to grow warmer, deeper and drier (Fig. 1). Conversely, the distinct signature left by soil moisture in the ABL can be exploited to infer land water availability using weather balloon soundings.
In previous research, the complex link between the land surface and the ABL was studied across scales ranging from site scale using tower measurements or models [21][22][23][24] , landscape-scale using satellite observations and/or models, among which convectionresolving Large Eddy Simulations 5,8,[25][26][27][28] , to regional and global scales using climate models, satellite observations, and/or machine-learning techniques 4,[29][30][31][32] . Spatial heterogeneity of soils and vegetation requires parameterizations in these larger-scale studies, inducing uncertainties. This was addressed in recent studies by estimating surface fluxes from near-surface observations of temperature and humidity using mixed-layer theory 33,34 . However, these observations do not allow analyses at the landscape scale, at which land-atmosphere coupling is expected to have more relevant impacts 26,28 .
In this study, we infer land-atmosphere coupling from an atmospheric perspective: we translate weather balloon soundings with an ABL model, the Chemistry Land-surface Atmosphere Soil Slab model for GLobal Studies (CLASS4GL), to make a global estimate of surface flux partitioning at the landscape scale 35 . The weather balloon sounding data consist of~15 million soundings, available from 1905 to near real-time, and from >2700 stations distributed across the globe (see "Data" in Methods). These comprehensive observations are routinely used to constrain weather forecasts, but so far, their use to study land-atmosphere coupling has been limited. Here, balloon soundings are filtered to select days with convective warm conditions (see "Preprocessing of weather balloon soundings" in Methods), excluding days on which sublimation occurs, avoiding complexities related to frozen surface water and substantial variation in the seasonal cycle of surface flux partitioning during the cold season and focusing on days which are driven by sensible heat and therefore governed by surface flux partitioning. Therefore, the data screening increases, but simultaneously potentially exaggerates, the chance of land states affecting ABL dynamics 36 . We identify 4236 suitable sounding days distributed globally over 97 stations, which are used to initialize the ABL model in the morning and to validate it in the afternoon, while the model computes concurrent surface flux estimates. We implement a routine that adjusts the initial morning soil moisture, the main remaining control on energy flux partitioning, in order to minimize the difference between observed and model-estimated vertical temperature and humidity profiles (see "Experimental setup" in Methods). The surface flux partitioning is thus inferred from atmospheric measurements which are interpreted and translated into surface fluxes through a model based on mixed-layer theory, and hence largely independent of potentially uncertain ancillary data of land surface characteristics, in contrast to climate models.

RESULTS AND DISCUSSION
Global patterns of energy flux partitioning In a first step, we compare the flux partitioning inferred from balloon soundings using CLASS4GL with several state-of-the-art gridded data products. Figure 2 shows the global distribution of energy flux partitioning, expressed as an evaporative fraction (EF; the ratio between latent heat flux and available energy), for three gridded products (a-c) and for CLASS4GL (d), focusing on warm days (temperature > 278 K). Considerable differences exist between the gridded products, as root-mean-square differences in the EF are as follows: 0.12 (GLEAM-FLUXCOM), 0.1 (GLEAM-ERA5), and 0.15 (ERA5-FLUXCOM), highlighting the ongoing challenges in surface flux estimation. The apparent differences across state-of-the-art gridded products can be due to different model formulations or underlying land surface parameterizations accounting for sub-grid heterogeneity. This problem can be overcome with the balloon sounding-based flux estimations, as these are largely independent of land-surface model assumptions.
There is reasonable agreement with the spatial patterns of the reference products: spatial correlations of CLASS4GL estimated EF, weighted by the number of sounding days per location and only including locations with more than 50 sounding days to avoid sampling biases, with GLEAM, FLUXCOM, and ERA5 are 0.67, 0.75, and 0.68, respectively. The fact that these values approach the correlations between gridded products calculated using similar methodology 0.89 (GLEAM-FLUXCOM), 0.88 (GLEAM-ERA5), and 0.93 (ERA5-FLUXCOM) is notable, given different temporal (continuous vs. discontinuous time series and daily vs. sub-daily averages) and spatial (1°× 1°grid cells vs. landscape-scale footprint from balloon soundings) data characteristics between CLASS4GL (see "Screening of weather balloon soundings" in Methods) and the gridded products. Further, the mean EF from CLASS4GL is slightly below the estimates of the gridded products. This can be explained by the difference in temporal sampling; Whereas the EF for gridded products is averaged over the entire day, the EF from CLASS4GL is averaged between morning and afternoon soundings, with most data available between 08:00 and 14:00 local solar time, shortly after which heating tends to increase EF 37 . Finally, we go one step further down the Local Land-Atmosphere Coupling (LoCo) process chain 36 that presumably governs ABL dynamics by validating the main control of EF (soil moisture), instead of directly validating EF. We find that the adjusted initial soil moisture from CLASS4GL correlates well with ESA CCI soil moisture (0.73; Supplementary Fig. 1), which validates both the land surface schemes applied in CLASS4GL and confirms that soil moisture leaves a signature in the vertical profiles, as measured by balloon soundings, by affecting the surface flux partitioning.

Distinguishing evaporative regimes
While it is known that EF and related land-atmosphere coupling changes between energy-and water-limited conditions 15,[38][39][40] , the potential implications of these modes on the ABL remain unclear. To distinguish these regimes, we use satellite-derived soil moisture observations 41 . Next, we investigate the control of soil moisture on the day-to-day variability of EF by linking the inferred latent and sensible heat fluxes to remotely sensed soil moisture conditions, which are independent of the soil moisture used in the flux estimation. Using remotely sensed soil moisture induces noise in this relationship, as (i) surface soil moisture only represents part of the depth that is relevant for evaporation, and (ii) surface and root-zone are known to decouple in dry conditions 42,43 . Nevertheless, this product is the only global observational soil moisture with an adequate time period available and has been used successfully in similar applications before 38,44 . Figure 3a illustrates that the energy flux partitioning is strongly regulated by soil moisture, even at the landscape scale, as the ratio between latent heat flux and surface available energy (the sum of sensible and latent heat flux) changes from dry to wet soils. This apparent sensitivity of energy flux partitioning to soil moisture is quantified by computing a least-squares regression fit for each soil moisture class (Fig. 3b). The linear slopes in Fig. 3b are comparable over wet soils, indicating that flux partitioning is insensitive to soil moisture (energy-limited conditions). This changes toward drier soils, where the linear slopes decrease with soil moisture, reflecting an increased sensitivity of flux partitioning to soil moisture (water-limited conditions). The point data underlying the contour lines in Fig. 3a and the linear models in Fig. 3b are shown in Supplementary Fig. 2. The moving medians in Supplementary Fig. 2 indicate that the assumption of linearity for quantifying the relationships in Fig. 3b is reasonable.
The linear slopes (EF * ) reflect the sensitivity of the change in latent heat flux to changes in available energy (assuming lines without a zero intercept). Therefore, the physical meaning of EF * and EF is similar, because EF reflects the ratio of the change in latent heat flux to changes in available energy (with a zero intercept). Both the sensitivity of EF * and EF to soil moisture changes when transitioning between wet and dry soils. We use EF* here as it fits the data more closely and consequently allows a clearer illustration of the contrast between water-and energycontrolled regimes. We impose a piecewise-linear model to quantify this transition, similarly to existing literature 15,39 , which is marked by the critical soil moisture (CSM, see "Distinguishing evaporative regimes" in Methods). Separating for environmental conditions is complex because the footprints of balloon soundings are time-varying depending on height, wind speed, and direction; this footprint is also large enough to integrate surface heterogeneity in terms of soil, vegetation, and climate characteristics. Different environmental conditions across sites and decoupling between the surface and root-zone soil moisture could, however, cause slightly different EF * for the same surface soil moisture  values. The dark-grey ribbon in Fig. 3c reflects the sensitivity of the CSM to this uncertainty. As this ribbon is relatively narrow, it justifies the use of the breakpoint as a sharp threshold to quantify the transition. We emphasize that it is not our intention to determine the sensitivity of this transition to environmental conditions, but merely to find a threshold to distinguish energyand water-limited conditions that is valid for our data selection, and can be used as a first-order distinction between water and energy-controlled conditions in our subsequent analyses. Next to that, this first-order estimate is robust with respect to (i) different soil moisture bin setups ( Supplementary Fig. 3), and (ii) different surface soil moisture products ( Supplementary Fig. 4).
Land surface effect on the atmospheric boundary layer When classified based on CSM, ABL properties show distinct differences between water-and energy-limited conditions (Fig. 4). Figure 4b shows that EF, as produced by CLASS4GL, is generally higher in energy-limited conditions than in water-limited conditions. Figure 4d-i shows that, according to the CLASS4GL simulations, in water-limited conditions, the ABL warms (2.72 K), dries (0.86 kPa) and, most notable, deepens (409 m) more during daytime than in energy-limited conditions. Strong differences in height are found due to higher heat capacities of deeper waterlimited ABLs 37,44 . Despite higher moisture input through evaporation in energy-limited conditions, we find that diurnal VPD increases driven by the temperature increase during the course of the day. In addition, the daily ABL evolution is influenced by upper air processes driven by synoptic systems, which are different between evaporative regimes, as can be seen from the tropospheric temperature lapse rate (Fig. 4c). Warm and dry days leading up to water-limited conditions are associated with lower tropospheric lapse rates, favoring a rapid deepening of the ABL 21,45,46 . Further, the ABL's relative humidity decreases more strongly during daytime in water-limited conditions than in energy-limited conditions, favored by the stronger entrainment of dry tropospheric air, which is mainly driven by surface sensible heat in convective ABLs 47 . Next to that, entrainment of dry air triggers contrasting upper air feedbacks in both evaporative regimes. In energy-limited conditions, entrainment of dry air increases the moisture gradient between the land surface and the ABL, thereby enhancing evaporation and leading to less sensible heat and shallower ABLs; in water-limited conditions, this entrained dry air may further constrain evaporation due to the influence of VPD on stomatal conductance under dry soils, and thus sensible heat is increased and the ABL grows deeper. Fig. 4 Global soil moisture influence on diurnal ABL evolution. a Surface soil moisture for available sounding days. When soil moisture is below critical soil moisture (dot-dashed vertical line in Fig. 3c and here), we distinguish water-limited conditions (red), and energy-limited conditions otherwise (blue). Evaporative fraction (b) and potential temperature lapse rate (c) from CLASS4GL averaged between 08:00 and 14:00 local solar time. Potential temperature (d, g), ABL height (e, h), and the vapor pressure deficit (f, i) from CLASS4GL at 08:00 and 14:00. Vertical blue and red lines denote the medians from the respective distributions. Annotations show the difference between the medians and the p values of the Kolmogorov-Smirnov test. Dark grey arrows indicate assumed causal relations between variables.
In Fig. 4, we effectively separate wet and dry regions using soil moisture data. We assess the more direct effect of surface flux partitioning on the ABL by computing regimes based on EF ( Supplementary Fig. 5). As expected, the diurnal deepening of the ABL is even stronger given the more direct influence of energy flux partitioning compared with that of soil moisture. At the same time, warming and drying are less pronounced which is possibly due to the enhanced heat capacity of the ABL. Interestingly, where the difference in morning ABL growth is greater when the data is separated based on EF than on soil moisture (Fig. 4), differences in θ and VPD are less pronounced. This could be explained by the nature of the variables that are used to separate the data: EF could theoretically vary from 0 to 1 during daytime hours, whereas daytime variability in soil moisture is much lower. Therefore, the stronger the water-limited conditions, the higher the probability that the days leading to these dry conditions have been waterlimited too, accompanied by higher morning θ and VPD differences in Fig. 4d, f. As EF responds on shorter timescales than soil moisture, days leading to low EF days need not be water-limited and therefore morning θ and VPD differences in Supplementary Fig. 5d, f is less pronounced. However, by separating on EF we ensure high sensible heat on low EF days, growing the ABL from sunrise to 08:00. Some, but not all, stations transition between water-and energy-limited conditions, due to the seasonal cycle in soil moisture. To isolate the role of seasonality in the results shown in Fig. 4, we recompute the figure for regimes of available energy, replacing the soil-moisturederived regimes and instead of depicting seasonal variation in the meteorological forcing ( Supplementary Fig. 6). We consistently find smaller diurnal changes in ABL characteristics in terms of warming, growth, and drying, confirming that our results in Fig. 4 are not an artifact of seasonal and latitudinal forcing variations, thereby justifying the separation between evaporative regimes based on CSM.
In conclusion, we illustrate that soil moisture conditions are reflected in the ABL, enabling the inference of energy flux partitioning from ABL-based measurements such as balloon soundings. Benefitting from a comprehensive archive of balloon soundings covering locations across the globe and spanning over decades, our approach can provide robust and largely independent EF estimates at the impact-relevant landscape scale. More generally, the relevance of the ABL for land-atmosphere coupling is increasingly recognized and has for example triggered efforts to add continuous ABL measurements to existing flux towers 48 . Such an extension would enable the application of our approach at these sites, and the reconciliation with local energy fluxes. In summary, we present an observation-based assessment of the ABL response to soil moisture variations worldwide and quantify the changes in ABL characteristics induced by soil moisture.

METHODS CLASS4GL
We use the Chemistry Land-surface Atmosphere Soil Slab model for GLobal Studies (CLASS4GL) framework 35 (https://class4gl.eu/). CLASS4GL initializes and runs a mixed-layer model (CLASS 19 ; http://classmodel.github.io/) with ancillary reanalysis, satellite, survey, and weather balloon sounding data worldwide. CLASS uses the mixed-layer equations originally proposed by Tennekes 49 and later adapted by Tennekes and Driedonks 50 to compute the daytime evolution of the mixed-layer. The advantage of using mixedlayer equations is that the mixed-layer, which is the well-mixed part of the ABL where potential temperature (θ), specific humidity (q), and wind components (u,v) are assumed to be constant with height due to turbulent mixing, can be summarized with one value for θ and one for q. Therefore, CLASS is computationally cheap, but at the same time, the mixed-layer theory on which it relies is well-established through, amongst other basic physical laws, the conservation of mass and energy. This way, CLASS represents the daily atmospheric boundary layer evolution and meteorological processes at the landscape scale, and can efficiently be used to analyze 1000 s of sounding days measured across the globe. In CLASS, available energy at the land surface is partitioned into sensible and latent heat flux, which is influencing mixed-layer θ, q, and height (h) from the bottom of the mixed-layer. The top of the mixed-layer is characterized by a θ and q inversion, which separates the well-mixed layer from the warmer and drier free troposphere. In the free troposphere, the θ and q lapse rates describe how θ and q change with height. At the top of the mixed-layer, the heat and moisture that is entrained from free-tropospheric air is controlled by the inversions of θ, q, and wind components. Recent developments in CLASS4GL include dynamic free-tropospheric θ and q lapse rates according to the different (observed) vertical air layers during the mixed-layer growth, which also change due to large-scale dynamic forcing of advection and subsidence and entrainment by shear 19,35 . A mixed-layer representation of the ABL as assumed in CLASS approximates the idealized θ and q profiles in Fig. 1. In selecting the appropriate observed vertical profiles for our analysis, CLASS4GL is equipped with necessary filters and criteria to ensure that the vertical profiles selected closely follow the convective (mixing) assumptions (see Figs. 3 and 4 in Wouters et al. 35 and Fig. 4 in van Heerwaarden et al. 51 ), represented by the idealized profiles in Fig. 1. Within these assumptions, the effect of entrainment on vertical profiles is of second-order importance, as the entrainment flux is a generally agreed-upon fixed fraction (0.2) of the surface kinematic heat flux. This constant holds for shear-free conditions, supported by atmospheric observations (see Conzemius and Fedorovich 52 for a comprehensive discussion). However, even while the mixed-layer theory is deemed well-established and with appropriate data screening, deviations from well-mixed assumptions exist, slightly more so in the case of q than θ. The main reason for this is that the q difference between the mixed layer and the free troposphere can be of the same order of magnitude as the well-mixed q value (Fig. 1b), whereas the θ difference is much smaller (Fig. 1a). Therefore, the observed q-profile could slightly deviate from the well-mixed profile at the top of the ABL due to entrainment of dry and warm free-tropospheric air, which in turn can be partially compensated by evaporation from the land surface. Moisture skewness could also exist at the land surface under high EF conditions. Where in reality the moisture gradient is reduced due to this skewness, thereby reducing evaporation, this does not occur in CLASS4GL, because this skewness is mixed throughout the ABL. Therefore, the moisture gradient is maintained accompanied by slightly higher evaporation. However, we do not expect this to be the case often in our selection of convective days, as in such conditions moisture is effectively mixed throughout the ABL. More information on the original formulation of CLASS within the CLASS4GL framework and its data sources, which are the main input parameters, their default values and the latest updates can be found in the previous studies 19,35,51,53 . We highlight one of several updates within the CLASS4GL framework: bulk transfer coefficients of momentum and heat have been calculated non-iteratively according to Wouters et al. 35 , which holds for stable and unstable surface conditions, further reducing the computational cost of CLASS4GL.

Screening of weather balloon sounding measurements
Approximately, 15 million weather balloon sounding measurements from >2700 stations across the globe are available from the Integrated Global Radiosonde Archive (IGRA) data set 54 , from as early as 1905 to near realtime. The radiosondes provide vertical measurements of, amongst others, temperature, pressure, relative humidity, wind direction, and speed.
Before data screening can take place, all the necessary mixed-layer properties to describe an idealized vertical profile have to be calculated from the measurements: h is determined as the height at which the Richardson number exceeds a certain threshold: 0.24 for strongly stable conditions, 0.31 for weakly stable conditions and 0.39 for unstable conditions 55 . The uncertainty range, used at a later step, can be derived from computing h across this range of Richardson numbers. The mixedlayer average θ and q are the mean of the observations between the land surface and h. The upper air conditions are linearly extrapolated between the two measurement heights above h: First, the tropospheric lapse rate is calculated as the change of θ and q with height. Second, the inversion is the difference between the mixed-layer θ and q and the extrapolated θ and q from the tropospheric lapse rate.
Once the mixed-layer properties are calculated, they go through a series of data-screening steps, which are largely the same as described by Wouters et al. 35 . The screening of weather balloon sounding measurements is necessary to ensure the selection of convective warm days, on which the described mixed-layer theory assumptions are met. By doing so, J.M.C. Denissen et al.
we consider only convective days that are driven by sensible heat flux, which is sensitive to land states (water-versus energy-limited conditions), governing the surface flux partitioning and the entrainment flux of heat. Therefore, the data screening yields days where the chances of land states being drivers for surface flux partitioning and ABL dynamics are higher. However, vertical profiles from excluded days also carry soil moisture signatures, albeit less pronounced due to larger influences of other meteorological processes. The criteria are the following: (i) A morning profile needs to be available before 12:00 local solar time. (ii) There should be at least 7 measurements below 3000 m. (iii) The uncertainty range of the determined h, as described above, should be smaller than 150 m. (iv) The mixed-layer should be well-mixed in terms of θ, ensured by omitting sounding days where the root mean square deviation of observed θ within the mixed-layer exceeds 1.5 K. (v) θ > 278 K, to exclude the possibility of having freezing temperatures on sounding days. (vi) An afternoon sounding, confirming to the same requirements as the morning sounding, should be available between local noon and 1 h before local sunset. (vii) The growth of h should be between 20 and 400 m h −1 and the mixed-layer should warm during daytime, to ensure a reasonable daytime mixed-layer evolution. (viii) ESA CCI soil moisture should be available in the 1°× 1°d egree grid cell resolution at the same time and location as the sounding 41 . After the data screening procedure, the weather balloon sounding measurements are eligible to use for estimating energy flux partitioning with CLASS.

Experimental setup
We extend the existing framework of CLASS4GL with an additional routine that forces CLASS to reproduce the afternoon mixed-layer average θ and q from observations within an acceptable range of uncertainty. This routine iteratively scales modeled soil moisture, assuming identical values for the surface and root-zone, and consequently, the energy flux partitioning to match the estimated θ and q to afternoon mixed-layer averages from observations. This iterative converging method has been used in earlier work, where Miralles et al. 3 matched Bowen ratios from the Global Land Evaporation Amsterdam Model (GLEAM 56 ) and Wouters et al. 35 matched EF from GLEAM to CLASS4GL produced values by iteratively adjusting initial soil moisture. By iteratively scaling the initial soil moisture, we assume that a mismatch in afternoon temperature or humidity results exclusively from the modeled soil moisture, and that any mismatch due to entrainment of the warm and dry free troposphere, is indirectly driven by soil moisture at multi-day timescales 19 . Two separate procedures and at least two model runs are necessary to ensure the matching of both afternoon mixed-layer average θ and q. We do this by first computing where X is either θ or q, σ XOBS is the standard deviation of the respective variable with height across the observations within the mixed-layer, X CLASS the mixed-layer average estimated from running CLASS4GL from the morning to the afternoon sounding based on a converged modeled soil moisture, X OBS the mixed-layer average based on balloon sounding observations and f X the difference in standard deviations. CLASS4GL estimated θ or q matches the observations adequately if f X < 0.5, otherwise the soil moisture is iteratively converged toward a value that satisfies f X < 0.5, by combining two zero-finding algorithms: the bisection method 57 and the secant method. If f X < 0.5 cannot be met within 10 iterations, the sounding day will be discarded. The two resulting soil moistures from matching on θ and q should be within a range of 0.05 from each other, to ensure that the resulting flux estimates from observations of θ and q are consistent. If so, the average from the two soil moistures will be used to initialize CLASS4GL in the morning and to compute the results used in this study. Otherwise, the sounding day will be discarded. Energy fluxes are successfully estimated on 4236 sounding days distributed globally over 97 stations after the filtering procedure and matching on afternoon mixedlayer average θ and q. Because we require ESA CCI soil moisture to be available on a sounding day, the sounding days are available from 1981 to 2015, with the highest data availability in the later years. CLASS4GL has been run with default settings and thereby does not account for largescale air circulation from subsidence and advection, and entrainment by shear. Ample uncertainties are expected with this large-scale circulation, as these estimates are based on 6-hourly values and validation is impossible. However, for our data selection, there are hardly any differences in the evolution of the ABL during daytime for experiments with and without large-scale circulation from subsidence and advection, and entrainment by shear (not shown). Moreover, our experimental setup secures that only sounding days are retained where a satisfactory match between modeled and observed afternoon temperature and humidity is found. When meteorological processes not accounted for in the model experiments could influence observed vertical temperature and humidity profiles, this is compensated for by adjusting initial soil moisture and consequently surface flux partitioning, thereby matching modeled vs. observed afternoon temperature and humidity. As Supplementary Fig. 1 shows, the adjusted soil moisture closely resembles the satellite-observed soil moisture with a correlation coefficient of 0.73, underlining the small importance of meteorological processes not accounted for in our model experiments for soil moisture and consequently surface flux partitioning. Table 1 displays the data sets that are used for analysis outside of application in CLASS4GL. All these gridded data sets are regridded to a 1°× 1 o spatial grid cell resolution to more closely resemble footprints from weather balloon soundings.

Distinguishing evaporative regimes
In Fig. 3b, we use CLASS4GL estimated surface energy fluxes in combination with satellite observations of surface soil moisture from ESA CCI to distinguish the evaporative regimes and the related transition between them. To this end, we compute linear least-squares regressions per soil moisture class. A linear model is only computed if there is sufficient data available (more than 30 sounding days) within the respective soil moisture class. All soil moisture data sets are used to infer the relation between soil moisture and energy flux partitioning in Fig. 3 and Supplementary Figs. 1-4. Energy fluxes from gridded data sets are used to validate the CLASS4GL estimated energy fluxes in Fig. 2.
In Fig. 3c, we apply a piecewise linear regression to expose the two distinctive evaporative regimes and the associated transitions between them, marked by the CSM. This piecewise linear regression is weighted by the number of sounding days per soil moisture class, to account for the uneven distribution of soil moisture values, which ranges from 43 (soil moisture < 0.1 m 3 m −3 ) to 1039 (0.14 < soil moisture < 0.18 m 3 m −3 ). Finally, the slopes of the linear model's EF * and therefore the breakpoint in the piecewise linear regression are subject to uncertainty, potentially related to differences between stations in terms of soil and vegetation conditions, decoupling between surface-and root-zone soil moisture, and balloon sounding footprints varying with height, wind speed and -direction. To account for these uncertainties that are reflected in the standard deviation around EF * , we resample the EF * by drawing from a normal distribution with the mean of the actual EF * and its standard deviation per soil moisture class. Subsequently, these 1000 resampled EF * were used to compute 1000 piecewise linear regressions and breakpoints (CSM). Note that the CSM depends on soil and vegetation characteristics and is hence reflecting the selection of sites used in this study [38][39][40] . We use it to distinguish water and energy-limited regimes across the sites, and subsequently to study ABL characteristics, while different CSM values might be derived for different (selections of) locations. The estimation of the CSM should be regarded as a first-order estimate as (i) the varying soil and vegetation characteristics of grid cells contributing to each soil moisture class considered in Fig. 3 induce uncertainty to our CSM estimation, and (ii) while root-zone soil moisture is regulating evaporation and surface flux partitioning, only surface soil moisture is readily observed across the globe, which is known to decouple from root-zone soil moisture, especially in extremely dry conditions 42,43 .

DATA AVAILABILITY
All data related to this paper may be requested from the corresponding author.

CODE AVAILABILITY
Model source code and more information about CLASS4GL are available from https:// www.class4gl.eu. Another code is available upon request from the corresponding author.