Introduction

The terrestrial land surface and atmosphere are coupled through a complex set of interactions and feedbacks that determine the fluxes of mass and energy between the two systems. Weather and climate are well known to determine the productivity of terrestrial ecosystems, but the functioning of the land surface can likewise modify weather and climate patterns.1,2,3,4 In some cases, the influence of one of these systems can propagate through the other to influence itself, establishing a feedback. An understanding of land–atmosphere feedbacks is essential for determining the regional impacts of climate variability and change on the ecosystem services humanity has come to depend, but remains a major challenge as analytical tools to quantify feedbacks have only recently been developed.4,5,6,7,8,9,10 Feedback processes in nature are difficult to directly observe and to infer, as cause and effect relationships may become obscured or break down when a process influences itself through an intermediary,11 as is often the case in the Earth System. Feedback processes amplify or buffer inputs, resulting in exaggerated or muted responses to perturbations, the latter of which can be difficult to identify. The most severe uncertainties in our climate models are believed to feature feedback.12,13,14 These challenges necessitate the development and application of novel methods to quantify feedback processes in the Earth System. This study presents direct observations of global land to atmosphere information flow through the use of a global network of surface energy flux and meteorological observations, introducing a statistical approach to characterize temporal and spatial variability in land–atmosphere coupling strength.

Energy exchange between the land surface and atmosphere provides a primary method of interaction — and thereby feedback — between the two systems. Downwelling solar energy absorbed by the land surface warms the soil and vegetation and drives fluxes of sensible and latent heat between the land surface and atmosphere. These energy fluxes modify the composition of the atmospheric boundary-layer (ABL) and drive convective processes that deepen the ABL and result in entrainment of air from the free troposphere.15,16,17,18,19 These changes then impact near-surface temperature and humidity as well as precipitation processes, resulting in potential feedbacks through ecosystem physiological response to favor subsequent latent or sensible heat fluxes that in turn impact ABL processes.20 Such feedback processes may intensify with future climate changes,21 with the potential to impact critical functions such as water availability22 and ecosystem resilience,23 and to intensify phenomena such as heat waves,24,25,26 drought,27,28,29 and local convective precipitation.30,31 Likewise, the impact of canopy photosynthesis and evapotranspiration on cloud development — and the potential for future climate change to further these effects32,33,34 — make understanding land–atmosphere feedback processes central to predictions of the future availability of ecosystem services.

These examples illustrate the complexity of the coupled land–atmosphere system. Disentangling the presence and strength of positive and negative feedbacks is an ongoing challenge in understanding how ecosystems and their management impact Earth System processes. Feedback is inherently nonlinear, and its study therefore calls for methods free from assumptions of linear proportionality, simple correlation, or isolated causes and effects.4 Likewise, and despite the sophistication of global climate models, our current model-based assessments almost certainly miss key feedback processes or scales of interaction, because they represent hypotheses about poorly understood processes, and not real-world observations of those processes. To robustly measure feedback and critique these hypotheses, we therefore require a method for direct, in situ, and relatively assumption-free (nonlinear and empirical) observation of directional functional couplings.

Process Networks (PNs) characterize the state of a system as a pattern of flows of mass, energy, and/or information that correspond to key system functions.11 Information flow statistics are a robust and mature method for delineating PNs, and have been previously applied to the direct and explicit measurement of feedback between the land surface and atmosphere using flux tower observations.35,36,37,38,39 PNs have been shown to accurately diagnose interactions between turbulent fluxes and the atmosphere in ecohydrological systems,35,36,37,40 and have accurately described functional differences between starkly diverse land surface ecosystems at continental scales.35,41,42 This paper’s choice of Transfer Entropy to delineate PNs43 is ideal to measure directional, scale-specific, and nonlinear couplings that characterize land-to-atmosphere feedbacks.

To investigate feedback between land and atmosphere, we focus on relationships among land surface turbulent energy fluxes of sensible (H) and latent heat (LE) and three atmospheric variables: downward global shortwave radiation (Rg) as an indicator of cloud cover, air temperature (Ta), and precipitation (P). Analysis of these terms provides a core set of surface flux and atmospheric variables that link land and atmosphere through turbulent flux exchange. The strength of the process coupling is quantified through the information flow as given by the normalized transfer entropy (T). T′(X → Y, τ) is a measure of the predictability of the time series Y from time series X at a time lag of τ. While characteristic τ for significant T′(X → Y, τ) depend on the variable pair, sub-daily timescales are believed to be the primary timescales for flux-based land–atmosphere feedbacks and the average T′ for τ from 0.5 to 18 h (TAvg) is used here to capture the functional relationships for land–atmosphere interactions.

Statistically significant values of information flow and feedback (p < 0.05) are established using the method of shuffled surrogates where surrogate T′ values are calculated using randomly shuffled time series of Xt and Yt to remove any correlation between variables. These surrogates are then compared to the observed T′.11,44 The fraction of instances during which significant process coupling is observed at any τ (FSig) is calculated as

$${{F}}_{{\mathrm{Sig}}}\left( {{{X}} \to {{Y}}} \right) = \frac{{{{N}}_{{\mathrm{Sig}}}\left( {{{X}} \to {{Y}}} \right)}}{{{{N}}_{{\mathrm{Tot}}}\left( {{{X}} \to {{Y}}} \right)}},$$
(1)

where NSig represents the total number of observations during which T′(X → Y, τ) is significant at any τ while NTot is the total number of observations taken into account. When FSig approaches 1, a coupling process is robustly significant, but when it approaches zero, the process is weak or absent.

The recent development of regional networks of co-located meteorological and carbon dioxide, water, and energy flux measurements provides a new opportunity to assess land–atmosphere coupling across terrestrial biomes and climate space globally. Here we leverage the LaThuile FLUXNET database,45 which provides globally distributed, to a degree standardized observations of land–atmosphere fluxes of energy and water using the eddy covariance technique spanning 251 sites (Supplementary Fig. 1) and representing 11 major IGBP (International Geosphere-Biosphere Programme) vegetation classes (Supplementary Table 1) across a large range of aridities, and over 10000 site months. We use these data to calculate information flows from land surface fluxes to atmosphere (i.e. coupling strength) and then train an artificial neural network (ANN) of land–atmosphere coupling strength across the terrestrial surface. The use of observational “big data” represents a unique approach to characterizing temporal and spatial variability in land–atmosphere coupling and feedback without a priori assumptions about underlying processes that is capable of directly observing and resolving critical processes at the interface between surface, vegetation, and convective ABL. Our approach therefore complements large-scale climate models and reanalysis products, for which these processes and feedbacks currently remain parameterized due to their comparatively coarse resolution. Given that our information flow PN methodology quantifies the presence, strength, direction, and significance of land-to-atmosphere coupling using a large sample of in situ flux tower observations, it can be used as in independent control for existing models and theory, in addition to providing unique insights into these difficult-to-observe processes. In this work, we focus on the land-to-atmosphere portion of land–atmosphere coupling, by investigating the directional information flow from H and LE to future states of atmospheric variables.

Several studies have identified global land–atmosphere coupling strength and associated “coupling hotspots” using climate models5,21,46 and reanalysis data,47,48 primarily focused on soil moisture–precipitation feedbacks. Here we examine these hypothesized feedback hotspots in the context of an empirical data-driven analysis, broadening coupling mechanisms to the specific surface fluxes (latent and sensible heat) that are directly measured through FLUXNET, and which are directly responsible for convection and changes in the near-surface atmosphere that impact ecosystem function. Given that PNs are a methodological distinct tool for the analysis of environmental data, they can serve as a validation tool for process-based models and more conventional observational analysis.

Results

There is pronounced seasonality in the magnitude of land-to-atmosphere coupling (Figs 1, 2) and seasonal patterns differ between the six coupling pairs considered in this work. Pairs (LE → Ta), (LE → P), and (H → Ta) exhibit low Fsig during winter months in temperate regions compared to high coupling strength during summer, whereas the seasonality is reversed for (LE → Rg) and (H → Rg). The coupling process (H → P) shows significant coupling during both winter and summer. Tropical regions exhibit little seasonality, except for (H → Rg) and (LE → Rg). These are much stronger during June through August, which also broadly coincides with the dry season in Amazonia. As expected, the increase in feedback coupling between winter and summer within temperate regions also coincides with a strong increase in vegetation density and greenness, represented here as the Normalized Difference Vegetation Index (NDVI).

Fig. 1
figure 1

The fraction of significant surface to atmosphere interaction as determined by a process network (PN) with respect to latent heat flux (LE), FSig (blue colorscale), for 251 sites of the FLUXNET LaThuile dataset and calculated as average across DJF (northern hemisphere winter) (a, c, e) and JJA (northern hemisphere summer) (b, d, f). Larger circles indicate more years of tower data considered for calculation of FSig. The background image is DJF and JJA NDVI from MODIS for 2016 (dark colorscale). When FSig approaches 1, a significant coupling is present, and when it approaches 0 the coupling is weak or absent

Fig. 2
figure 2

Same as Fig. 1, but for sensible heat flux (H)

Spatial patterns in feedback coupling strength are related to IGBP biome type (Fig. 3). The lack of a clear annual cycle for (H → P), visible in Fig. 2, stems mainly from forest ecosystems (mixed forest, MF; deciduous broad leaf forest, DBF; evergreen needle leaf forest, ENF; evergreen broad leaf forest, EBF), which are abundant in FLUXNET, and closed shrubland (Fig. 3, see Supplementary Table 1). EBF, which encompasses tropical rainforests, exhibits strong coupling with little seasonality except for (LE/H → Rg), where it follows the general trend of low coupling during boreal summer. Savanna and shrubland type systems (savanna, SAV; woody savanna, WSA, open shrubland, OSH; closed shrubland, CSH) also show pronounced land–atmosphere feedback dynamics that deviate from seasonal cycles found in other biomes. Strongly increased (LE → Rg) and reduced (LE → P) during summer, and strong (LE → Ta) throughout the year for these generally semi-arid to arid ecosystems, highlight the importance of interactions between biome and prevailing climate in governing land to atmosphere coupling behavior. The amplitude in coupling strength for annual cycles approaches 0.8–1.0 for all couplings except (LE → Rg) for which the amplitude tends to be less than 0.5.

Fig. 3
figure 3

The annual cycle of significant feedback between land and atmosphere (FSig) for LE (a–c) and H (d–f), separated by land cover type. The time axis is given in months since winter solstice to align northern and southern hemisphere. When FSig approaches 1, a significant coupling is present, and when it approaches 0 the coupling is weak or absent

There is a great deal of variability in land-to-atmosphere feedback coupling strength between biomes, climate zones, and seasons. To aid interpretation, we plot the feedback coupling strengths described above against monthly values of Ta and monthly aridity (P/ETp) (Figs 4, 5). There is an increasing increasing trend of increasing feedback strength (TAvg) with increasing monthly Ta across all biomes. Land–atmosphere coupling is largely absent at Ta < 0 °C, which can be expected given that energy inputs and ecosystem activity are generally minimized under these conditions. Savanna and shrubland type ecosystems exhibit the lowest coupling originating from LE at high monthly temperatures (Ta > 20 °C), whereas the situation is reversed with high coupling originating from H when Ta > 20 °C.

Fig. 4
figure 4

Average transfer entropy (TAvg) of latent heat (LE) to air temperature (Ta), precipitation (P), and global radiation (Rg) as a function of environmental conditions for temperature (ac) and aridity index P/ETp (df) (note the logarithmic x-axis) and separated by land cover type. Values are binned into 10 categories with equal number of members for each bin, and then averaged. The black lines and gray shaded areas give the mean and standard deviations across all data. The normalized standard deviation for each bin is presented in Supplementary Fig. 2. Higher transfer entropies indicate stronger land-to-atmosphere information flow

Fig. 5
figure 5

Same as Figure 5, but for TAvg of H to Ta, P, and Rg. The normalized standard deviation for each bin is presented in Supplementary Fig. 3

The behavior of TAvg is more complex with respect to P/ETp. There is little relationship between TAvg(LE → Ta) and aridity, but a clear relationship emerges for TAvg(H → Ta), in which feedback originating from H increases with aridity for all vegetation types. The feedback coupling from surface fluxes to P peakes at P/ETp values near unity and WSA, OSH, and CSH exhibit the strongest feedbacks of all vegetation types in that range of P/ETp. For the coupling between surface fluxes and cloud cover as indicated by Rg, we find that there is little feedback for P/ETp > 1. Savanna (SAV and WSA) and shrub (CSH and OSH) vegetation classes generally exhibit the highest feedback for Ta and Rg for low P/ETp. While we chose to present TAvg as the coupling metric in this work, there is considerable variation in coupling timescales between variable pairs (Supplementary Figs 46), which in itself shows dependence on T and P/ETp and may be related to the timescales needed to effectively connect land-surface and atmospheric processes. For example, the dominant coupling timescales in the order of 6–12 h between surface fluxes and P or Rg show substantial time-lags in the atmosphere’s response to surface fluxes, which are consistent with timescales typically found in convective boundary layers.

The extrapolation of observed feedback strength (TAvg) from FLUXNET sites to the global map reveals several hotspots of land–atmosphere coupling that stand out from global average feedback strength (Fig. 6; see Supplementary Figs 712 for monthly data). The ANN models had R2 values of 0.69 to 0.92 for (H → P) and (H → Rg), respectively (Supplementary Table 2), with no evidence of overfitting, so the model’s extrapolation is robust when tested against the 251 FLUXNET sites and over 10,000 observed site-months. We find strong land-to-atmosphere feedback in sub-Saharan Africa (LE → Ta and LE → P), the central and southwestern US during summer (H → Rg and H → Ta), the southern Andes, South Africa and Australia during DJF (H → Rg), Amazonia (LE → P and LE → Rg), agricultural areas in eastern Brazil (H → Ta and H → Rg), the African Rift Valley (LE → Ta and H → Rg) as well the Congo, where strong coupling persists throughout the year, but switches from (LE → Rg) in DJF to (H → Rg and LE → Ta) in JJA. This plot is thematically similar to the soil moisture based results from Koster et al.5

Fig. 6
figure 6

Average estimated transfer entropy (TAvg, red colorscale) of latent (LE) and sensible heat (H) to air temperature (Ta), precipitation (P), and global radiation (Rg) for northern hemisphere winter (DJF) (a, c, e, g, j, k) and summer (JJA) (b, d, f, h, j, l) extrapolated from FLUXNET sites to the global map using an artificial neural network (ANN). Gray and black areas indicate lack of input data and negative (non-physical) TAvg-values, respectively. Dark and red areas indicate strong land to atmosphere feedback associated with the indicated land surface flux process

Discussion

PNs and other empirical methods based on information theory applied to environmental “big data” provide a wealth of information about land–atmosphere coupling. Specifically, PNs provide information about functional relationships between ecosystem variables that can be used to investigate processes such as land–atmosphere coupling and feedbacks as well as their response to environmental change. Using an ANN to extrapolate these couplings to the global scale, we identified several hotspots of land–atmosphere coupling (Fig. 6). Monthly data are presented in Supplementary Figs 712. Unlike previous studies e.g. 5,46 which used process-based models, the ANN is based on empirical extrapolation of observations and does not include a priori assumptions about functional relationships to demonstrate the existence of feedbacks. It can therefore be used to complement global models, which require (i) process relationships to be known and (ii) may require parameterizations to include processes that are under-resolved due to their global nature.

We investigated six couplings between turbulent fluxes and atmospheric/near surface properties by taking advantage of databases that incorporate observations of a wide range of surface meteorology and fluxes. Couplings of H and LE to P and Rg are directly related to the hydrologic cycle, in contrast to the coupling with temperature, which is more related to near surface conditions and cover type. The ANN trained on PN results identifies feedback hotspots in the southwestern and central US similar to,5 but does not reproduce the hotspot on the Indian subcontinent. However, for the southern African hotspot we find that the coupling signal is strongest for H, LE, and Ta rather than precipitation, and is more pronounced in DJF. For the US hotspot, we find a stronger signal for H, LE, and Rg rather than P. The ANN also detects the hotspots in the Congo Basin, South Africa, Australia and to some extent Brazil (for H to Rg and Ta), in agreement with Notaro and Zeng et al.46,48 Similarly, several regional studies highlighted the strong coupling between surface and air temperatures for semi-arid regions in the US and Europe,6,49 which is reflected in the PN results for the southwest US and to some extent for the Iberian peninsula. Compared to previous studies, we find a stronger coupling of LE to Rg and P in Amazonia, further highlighting the importance of tropical rainforest function for cloud development and regional precipitation.50 We find Rg to exhibit much clearer land to atmosphere coupling than P, which can be expected given that not all clouds produce precipitation. The reduced coupling could also indicate that models are overly sensitive with respect to their precipitation response or that the PN has problems detecting feedback in P due to the sparseness of precipitation events. While the latter cannot be excluded, global and even regional models rely on cumulus parameterizations for precipitation generation, which have well-known difficulties in producing realistic precipitation.51,52

Extrapolation of empirical PN results to the global scale shows two distinct advantages compared to global scale modeling approaches. As a statistical method, global results at a high resolution (e.g. 0.25°) are computationally cheaper than running an Earth System Model, while also providing detailed information on land–atmosphere coupling on spatial and seasonal scales. Also, through considering multiple land–atmosphere feedback pathways, PNs are capable of providing information that can be used to improve process-level understanding of feedbacks not accessible in more complex models. At the same time, data-driven approaches such as PNs and ANNs are not constrained by physically realistic limitations and cannot prove cause–effect relationships. This should not be considered a limitation but as a feature. Combined with domain expertise, data-driven methods can be very useful in guiding research toward regions and processes that merit further scientific attention.

The PN and ANN reveal that dryland ecosystems exhibited the strongest ecosystem–atmosphere feedback due to variability in available water (Figs 36). We find the highest couplings between surface fluxes and precipitation at P/ETp ~ 1, highlighting the importance of sufficient water supply and soil moisture in controlling land–atmosphere interactions.53,54 Interestingly, for savannas, high monthly mean temperatures (Ta > 20 °C) are associated with low TAvg(LE → P), indicating the water limited state of these systems during the dry season and the associated absence of coupling. Similarly transition periods between wet and dry seasons and monsoon circulations are important for soil moisture–precipitation coupling.47,49 Vegetation response to water limitations occurs on a continuum from isohydric (plants closely regulate transpiration through stomatal conductance in response to atmospheric vapor pressure deficit) to anisohydric (plants have little regulation of stomatal conductance). From these species-level traits, ecosystem-level drought responses emerge.28,55 Grasses, which were thought to be mostly anisohydric, often exhibit isohydric behavior in semi-arid environments,56,57,58 supporting the notion that semi-arid grasslands can exhibit substantial feedbacks with the atmosphere. The resulting interplay between vegetation, surface-energy flux partitioning and atmospheric control also influences the development of local convection, which can be an important ecosystem moisture source.31,59,60,61,62 Substantial feedbacks between biosphere and precipitation were recently reported for semi-arid and monsoonal regions,63 highlighting the need of an accurate representation of the biosphere’s response to temperature, radiation, and water availability for predicting hydrometeorological and climatological feedbacks.

The strong coupling between turbulent fluxes and P for semi-arid systems (i.e. savannas, Figs 5, 6) is particularly interesting in the light of their pronounced seasonality. Given the fact that the analysis covers monthly system state, and precipitation inputs are highly pulsed, intermediate P/ETp might correspond to rapidly changing moisture supplies at the surface that elicit responses in the land–atmosphere system. The increase in coupling between LE and H to Ta and Rg for small P/ETp further highlights the importance of convective processes that impact ABL growth and present multiple avenues for feedbacks mediated by the surface–ABL system.15,16,17,18,19

PNs applied across aridity gradients can be used to better understand potential changes to land–atmosphere interactions and ecosystem functioning across temporal and spatial scales. Given that semi-arid ecosystems are critical to the carbon cycle and climate,64,65,66 and are likely to expand67 and deteriorate68 under climate change, the ability of PNs to quantify their coupling to the atmosphere is of particular importance. Additionally, projected changes in aridity are expected to exhibit complex changes across the globe,69,70 increasing the uncertainty for land–atmosphere interactions and feebacks.

This study is not without limitations related to data availability and uncertainty. This study relies on near surface observations as a proxy for land–atmosphere coupling rather than direct observations of boundary-layer processes that mediate these couplings and feedbacks due to a lack of continuous and spatially distributed ABL observations, which needs to be addressed by the community.4 Also, PNs allow for the detection of coupling relationships irrespective of assumptions of linearity or sign of the relationship. At the same time, this means that PNs do not provide information about exact nature of coupling relationships, which can then be explored with more conventional methods. Similarly, turbulent flux measurements as collected by FLUXNET do not close the surface energy balance40,71,72,73,74 and it is unclear whether H and LE are similarly affected and to what extent this impacts the results generated by statistical methods applied to this database. Also, FLUXNET does not systematically cover all global biomes, and tends to under-sample remote and harsh environments. The southern hemisphere, northern Africa, and central Asia are particularly under-represented, limiting our ability to assess the systems’ responses to global environmental change and implications for surface–atmosphere feedbacks. This has the potential to limit generalizability of our results to the globe as indicated by negative (i.e. non-physical) TAvg values in some remote areas. Similarly, the extrapolation of PN results using an ANN relies on the use of climatological averages. FLUXNET LaThuile contains 251 sites with approximately 1000 site months, which translates to on average 3.5 site years per site and may thus lead to mismatches between climatological states and flux observations, which may result in biases for the extrapolated ANN.

H and LE flux dynamics are closely coupled through the surface energy balance but observed couplings of H and LE diverge (e.g. there is significant coupling between H and Ta, in a given region but not for LE and Ta or vice versa). This behavior of the PN is likely due to the fact that despite their correlation, H and LE are rarely of the same magnitude. The PN is implicitly sensitive to the absolute magnitude and the time rates of change in the time series and thus acts as a low-pass filter on information flow from flux variations (see Supplemental Note 1 for additional details).

Despite its limitations, given the PNs good agreement with previous studies in diagnosing feedback hotspots and its coupling response with respect to Ta and P/ETp (Figs 4, 5), which is in line with ecohydrological expectations,53 we have confidence that observed information flow from the growing body of environmental big data — through networks such as FLUXNET — can be used to provide unique insights on land–atmosphere feedbacks from an empirical perspective and can serve as independent empirical verification for process-based climate models, potentially driving progress in the improvement of climate models toward the representation of critical processes for projecting land–atmosphere interactions and feedbacks.

In conclusion, we demonstrate that PN results can be used as independent validation for process-based models based on observed information flows between the land surface and atmosphere. As hypothesized by prior models and research, savanna, shrubland, and other semi-arid ecosystems exhibit a strong response in their atmospheric feedback behavior based on seasonal water availability and aridity. Information flow from surface to atmosphere for other variables exhibited seasonal variability with the exception of tropical rainforests and was a strong function of air temperature. In the light of dryland expansion their vulnerability to climate change, this might strongly impact land–atmosphere coupling including important precipitation processes.

Methods

Observations

Observed variables were obtained from the FLUXNET LaThuile synthesis dataset, which encompasses data from 251 sites representing nearly 1000 site years at a temporal resolution of 0.5 h. To ensure data quality, we (i) used only original data and gap-filled data of high quality, as indicated by the data-quality flag provided by FLUXNET; (ii) excluded site years with less than 50% available data; (iii) excluded outliers that were identified by exceeding six standard deviations compared to detrended data which had the diurnal cycle removed using a periodic anomaly (except for P); and (iv) excluded site-months with <500 observations (out of ~1400 possible per month). This resulted 10398 site-months being used for this analysis. Monthly T′ was then calculated across sites that represent 11 major IGBP vegetation classes (Supplementary Table 1). Monthly potential evapotranspiration (ETp) values, which are calculated from the Penman–Monteith equation using FLUXNET measurements and provided as part of the LaThuile dataset, are used to quantify the aridity of the sites through the ratio P/ETp.

Process network (PN)

Functional relationships between ecosystem and atmospheric variables are calculated using a PN employing the open source package ProcessNetwork version 1.4.75

Transfer entropy (T) was calculated as11,43

$$T\left( {X_t \to Y_t,\,\tau } \right) = \mathop {\sum }\limits_{y_t,\,y_{t - 1},x_{t - \tau }} p\left( {y_t,y_{t - 1,},x_{t - \tau }} \right)\log \frac{{p\left( {y_t{\mathrm{|}}\left( {y_{t - 1},x_{t - \tau }} \right)} \right)}}{{p\left( {y_t{\mathrm{|}}y_{t - 1}} \right)}},$$
(2)

where the predictability of time series Yt based on knowledge of time series Xt at time lag τ is calculated using yt-1 as the immediate history of Yt and xt-τ as the history of Xt at τ. The p denotes the corresponding probability density functions. T is bounded between 0 and log(m), where m is the number of discrete microstates y taken by variable Yt. We normalize T to a unit-less fraction by division with its upper limit [log(m)], yielding the normalized transfer entropy (T′). T′(X → Y, τ) was calculated for 0.5 h increments of τ from 0.5 to 18 h and then averaged across all 36 increments to yield TAvg (Supplementary Figs 4, 5 present additional information on the underlying significant timescales of TAvg).

In order to achieve a balance between entropy estimation accuracy and limited observations in the numerical estimation of p(y), m = 20 was used dividing H and LE into 20 bins (referred to as microstates in information theory nomenclature) of equal width. Note that these microstates do not have a physical significance, but serve as the basis for determining the underlying relationship between X and Y. T(X → Y, τ) measures additional information that is provided by knowledge of X at time-lag τ in addition to information provided by the history of Y itself. It is a statistical index for physically causal and directional coupling (not correlation), albeit with limitations. The reader is also referred to previous works11,44 for details on PN calculation methodology.

Artificial neural network (ANN)

To extrapolate results from site level to the entire land surface, artificial neural networks (three-layer feed forward) were trained for each flux coupling. The ANN was trained to extrapolate TAvg values from FLUXNET using gridded data at 0.25° resolution. ANN training inputs were monthly Ta, Rg, P, ETp, the enhanced vegetation index (EVI), IGBP class, elevation, and absolute latitude. ANN outputs were monthly TAvg values for each coupling at 0.25° resolution using the auxiliary datasets described at the end of the methods section.

ANNs have been widely used for climate change and ecosystem research and, given their skill in dealing with noisy and unbalanced datasets,76 they are well suited for PN research as they do not require geographically well distributed training sites and are robust to uneven distribution in IGBP classes or climates in the training dataset. To minimize overfitting, which ANNs are sensitive to, we employed Bayesian regularization backpropagation as the training function in a three-layer feed-forward ANN. This improves generalization for small and noisy datasets.77 We divided the training dataset randomly into training (70%) and test (30%) datasets and the performance of the ANN was evaluated on the test set using the Pearson’s R coefficient (Supplementary Table 2). The ANN analysis was performed with the Neural Network Toolbox in Matlab2014b.

Auxiliary data for ANN extrapolation

In addition to FLUXNET site level data (Ta, Rg, P, and ETp), the ANN was trained using IGBP (provided by FLUXNET), elevation above sea level, and the EVI. For elevation, we used data provided by FLUXNET, if present. In the absence of site provided data or if only an approximate height was provided, United States Geological Survey (USGS) Global Multi-resolution Terrain Elevation Data 2010 (GMTED2010) was substituted using a nearest neighbor approach. EVI from the Moderate Resolution Imaging Spectrometer (MODIS) Monthly L3 Global V006 (Terra: MOD13C2; Aqua: MYD13C2) was assigned to the sites using the following procedure:

  1. 1.

    If both MODIS Terra and Aqua 0.05-degree data was available for the given month and year, their mean was used applying a nearest neighbor approach.

  2. 2.

    If either MODIS Terra or Aqua were available, the available data was used.

  3. 3.

    If neither were present (e.g. for site months before the Terra and Aqua launch dates) the long-term mean for Terra and Aqua for that location and month were used.

  4. 4.

    The value was set to missing, if no acceptable EVI values could be calculated (e.g. snow and ice cover during winter for all years).

Information about the gridded datasets used for the ANN can be found in Supplementary Table 3. The IGBP landcover was assigned using the dominant land cover class (as percent cover within the 0.25° × 0.25° grid cell) from the MODIS MCD12C1 product. Data availability for EVI is shown in Supplementary Fig. 13. Global monthly mean meteorological data (Ta, Rg, P, and ETp) for ANN extrapolation are obtained from GLDAS (Global Land Surface Data Assimilation System) V2.1 at 0.25° resolution. We use the 30-year climatological average (1981–2010) for ANN extrapolation. On overview about total data availability for training of the ANN is given in Supplementary Fig. 14.