Introduction

Tropical forests host up to half of Earth’s biodiversity1,2,3. However, the climatic conditions encountered by organisms in the tropical forests are not yet well understood. Temperature patterns are a fundamental factor defining the survival, growth, and reproduction rate of organisms4,5, shaping the occupancy, distribution, and diversity of species6,7,8,9. Therefore, to better comprehend the ecological niches of species, temperature becomes the most important determinant10,11,12.

Despite the importance of temperature in ecosystem functioning and services, currently available climate datasets cannot properly capture the range and variability of temperature in tropical forests. The vast majority of products available at regional or global scales provide estimates of open-air temperature, which accurately represent the conditions over open, well ventilated, homogeneous areas at 2 metres above the ground. These measurements represent the temperature experienced outside the canopy of tropical forests, and they can differ by several degrees from the conditions experienced at understory level below the forest canopy (i.e., the microclimate)13,14.

Forest structure is a prominent factor driving the fine-scale horizontal and vertical variation in understory temperature15,16. During a clear-sky day, most of the incoming shortwave solar radiation is either absorbed or reflected by the forest canopy, which, along with evapotranspiration cooling and damped air mixing, helps to reduce understory temperature15,17. At night, on the other hand, forest canopy helps to retain outgoing longwave radiation, leading to a warmer temperature as compared to open-field conditions15. At larger scales, topographic factors such as elevation, slope, and aspect also influence microclimate patterns18,19. For example, the pooling of cold air in low-lying terrains, aspect-related exposure to solar radiation, and the temperature lapse rate due to elevation differences are all well documented20,21,22,23.

Although microclimate has been of long-standing interest in ecology, earlier studies had limited scope as they were based on field measurements at single point locations24,25,26. Recent advances in remote sensing, big-data processing, and the growing availability of ready-to-use fine-resolution remote sensing datasets have created a renewed interest in microclimate ecology9,27,28. By employing these advanced techniques, ecologists are gaining new insights into the processes underlying microclimate variability at continental scale and postulating the consequences on forest habitats in nexus with changing climate14,18,29.

In recent years, efforts have been made to reveal the global patterns of understory and near-surface temperatures. The SoilTemp initiative, for instance, pools microclimate data from thousands of temperature sensors spread across the word30. Using this database, multiple understory bioclimatic variables have been developed for European Forests at 25 m spatial resolution14,31. Similarly, global estimates of soil temperature at a 1 km resolution were also developed based on this database32. Nonetheless, this latter study is known to largely extrapolate the estimates in tropical forests, given that the training dataset for the models used in the study were mostly situated in temperate and boreal regions32. Furthermore, the spatial resolution of 1 km is also insufficient for many micro-scale ecological studies33. Consequently, to date, the spatial and temporal patterns of temperatures inside tropical forests remain unquantified.

Bridging this knowledge gap is fundamental to foster a next generation of ecological and biophysical models in tropical regions, and thus improve our understanding on how living organisms will respond to climate change. In this study, we present fine scale estimates of understory air temperature (i.e., 15 cm above ground) for the global tropical forest. We used a machine learning model trained with in situ temperature data collected between 2016 and 2021 by 180 microclimate sensors spread across three continents (Fig. 1a). The model was driven by satellite observations of forest structural and functional traits, topographic variables, as well as macroclimatic conditions retrieved from atmospheric reanalysis data. We produced 30 m spatial resolution estimates of microclimate temperatures, providing information on both diurnal and seasonal variability, thus providing the understory thermal ranges. Furthermore, we evaluated day-time and night-time temperature offsets (i.e., the difference between macro- and microclimate temperatures), to quantify the capacity of tropical forests to buffer large-scale climate variability. Finally, we demonstrate that our estimates of understory temperature reveal spatial heterogeneity patterns that are otherwise masked in macroclimate datasets.

Fig. 1: Study area, measurement sites, and modelling output.
figure 1

Panel a depicts the locations of the sampling regions and the distribution of selected measurement points within each region. Panels bd illustrates the annual mean spatial variations of modelled understory air temperature within a 10° × 10° area block selected around the equator in South America (b), Africa (c), and Southeast Asia (d). Panels eg present the monthly variation of modelled understory air temperature for selected regions of South America (e), Africa (f), and Southeast Asia (g). Each line graph depicts the spatially averaged values for its respective region. The solid red line indicates the mean daily temperature, and the shaded region denotes the range of day-time (upper bound) and night-time (lower bound) air temperatures under the canopy.

Results

We produced pantropic estimates of daily average (Tdaily), day-time (Tdt) and night-time (Tnt) understory temperature at 30 m spatial resolution. As a snippet to the modelling results, Fig. 1b–d shows the spatial variation of annual mean Tdaily in three continents within a 10° × 10° area selected around the equator. Monthly variations of temperatures for each selected area are shown in Fig. 1e–g. For the selected areas, the average Tdaily, Tdt, and Tnt in Central Amazonia were 24.5 ± 0.5 °C (standard deviation over a 10° × 10° area), 26.1 ± 0.7 °C, and 23.3 ± 0.5 °C, respectively (Fig. 1e). Understory temperatures were slightly cooler in the central areas of the Congo basin compared to Central Amazonia (Tdaily = 23.9 ± 0.6 °C, Tdt = 25.5 ± 0.8 °C, Tnt = 22.6 ± 0.6 °C) (Fig. 1f). We observed a higher spatial variability of Tdaily in the forests of Borneo Island (Fig. 1g), largely due to the strong topographic heterogeneity; however, temporal variability was low compared to the other two regions (Tdaily = 23.9 ± 1.6 °C, Tdt = 25.3 ± 1.7 °C, and Tnt = 22.9 ± 1.7 °C) (Fig. 1g).

The temperature offset (ΔT), which represents the difference between the expected understory temperature and open-air temperature, also showed large spatial and temporal variability (Fig. 2). Southeast Asia showed a relatively stable daily mean ΔTTdaily) throughout the year, with only subtle seasonal and latitudinal changes. Nonetheless, some areas of Southeast Asia presented exceptionally positive ΔTdaily (i.e., understory temperatures warmer than the macroclimate), where positive ΔTdaily values were observed during the dry season between May and September (Table S1). On the other hand, ΔTdaily, as well as the day-time offset (ΔTdt), remained negative throughout the entire year across South America and Central Africa (Figs. 2 and S1). Forests in Africa had some of the highest intra-annual ΔTdaily variability, while lowest values were observed in the forests of Southeast Asia (Table S1).

Fig. 2: Spatial and temporal variation of mean daily temperature offset.
figure 2

The offset (ΔTdaily) was calculated by subtracting open-air temperature (i.e., ERA5-Land) from modelled understory air temperature. Panels af show the pixel-level variations of ΔTdaily for two months of the year. To present monthly variation, six locations (each of the size 1° × 1° area) were randomly selected on both sides of the equator in South America (a), Africa (b), and Southeast Asia (c). Panels gl depict the intra-annual fluctuations of ΔTdaily at the selected locations. Panels gi represent monthly ΔTdaily variation for the selected locations in the North while panels jl represent ΔTdaily variation in the South. In line graphs, the shaded region around the solid (red) line represents the spatial variation within the selected block of each location.

In South America, areas near the equator displayed little intra-annual fluctuation in ΔT. However, in the southern parts of the Amazon basin, a strong seasonal signal was observed, with ΔTdaily declining during the dry season from July to November. A similar pattern was observed in Africa, with seasonal stable offsets closer to the equator, and amplified seasonal signals at higher latitudes (e.g., beyond 5° S and 5° N). Overall, elevation was an important feature regulating the spatial patterns of ΔTdaily and ΔTdt, with larger offsets (negative values) often observed at higher elevations (Tables S1 and S2).

The night-time offset (ΔTnt) showed a more diverse spatiotemporal pattern primarily driven by elevation and seasonality. In low-elevation forests of South America and Africa near the equator, the ΔTnt values remained negative during the entire year (Table S3). However, in northern parts of the Amazon Forest (e.g., French Guinea), night-time understory temperatures were warmer during wet seasons (e.g., 0.25 °C warmer between May–Jul and 0.37 °C warmer between Dec–Feb) (Fig. S2g). In Borneo, ΔTnt was on average 0.87 °C during the wet season (from Nov–Mar) (Fig. S2i). In mid-elevation forests of Eastern Indonesia (above 5° South), ΔTnt showed a significantly positive signal during the dry season (from Jun–Oct) (Fig. S2l). Across all three continents, intra-annual variability of ΔTnt was notable in mid- and high-elevation forests located above 5° in both directions of the equator (Table S5).

The spatial patterns of diurnal understory temperature range (RT) during the months of January and August, as well as the monthly values of RT at selected locations, are presented in Fig. 3. In South America, some forests north of the equator (e.g., in French Guinea) showed a bi-annual seasonal pattern, with the first RT peak around March, and the second around October (Fig. 3g). These peaks were at the onset of wet seasons. The southern part of the Amazon basin had maximum RT during the local dry season, between August and September. A similar pattern was observed on the African continent, with higher RT values during the dry seasons (Fig. 3h, k, Table S6), which alternated between the Northern and Southern hemisphere. In Southeast Asia, forests in north Borneo showed no pronounced RT peak with only slight fluctuations in various months (Fig. 3i). The forests of Eastern Indonesia showed maximum RT in November before the start of a rainy season (i.e., from Dec–Mar). Despite the differences in intra-annual patterns, average RT values in all continents fluctuated between 1.5 and 5 °C (Fig. 3, Table S6), whereas macroclimate RT ranged between 3 and 7.5 °C.

Fig. 3: Spatial and temporal variation of diurnal temperature range.
figure 3

The temperature range (RT) was calculated by subtracting night-time understory air temperature from day-time understory air temperature. Panels af show the pixel-level variations of RT for two months of the year. To present the monthly variation, six locations (each of the size 1° × 1° area) were randomly selected on both sides of the equator in South America (a), Africa (b), and Southeast Asia (c). Panels gl depict the intra-annual fluctuations of RT at selected locations. Panels gi represent monthly RT variation for the selected locations in the North while panels jl represent RT variation in the South. In line graphs, the shaded region around the solid (red) line represents the spatial variation within the selected block at each location.

The spatial heterogeneity of the understory temperatures was assessed using empirical semivariograms fitted with exponential model functions (Fig. 4). The distance at which the semivariograms flatten represents the minimum-distance where observations are no longer spatially-autocorrelated (d). The d for understory temperatures (dunder) was substantially lower than for open-air temperature (dopen) across all continents, thus providing quantitative evidence that microclimate patterns display a higher spatial heterogeneity than what can be inferred from the macroclimate data. These results highlighted that accounting for the effects of vegetation biophysical characteristics and topographic features in regulating temperatures substantially contributes to revealing subtle spatial patterns of thermal traits across tropical forests.

Fig. 4: Semivariogram analysis between macroclimate and microclimate datasets.
figure 4

Panels ac and df present the spatial variability of open-air and understory temperatures, respectively. For semivariogram analysis, a 5° × 5° area was selected over three continents (e.g., as depicted in panels ac, Central Amazonia in South America, Congo basin in Africa, and Borneo in Southeast Asia). Panels gi depict the semivariogram analysis performed on selected regions of South America (g), Africa (h), and Southeast Asia (i). An exponential model was fitted to the experimental/sample variogram (shown as points in gi) values to define sill (ɣ) and minimum-distance (d) for each dataset.

Discussion

Our study provides the first global estimate of near-ground air temperatures in tropical forest understories, providing a crucial foundation to quantifying the conditions experienced by many organisms in some of the most biodiverse places on Earth. The results reiterate that currently available gridded macroclimate data fail to accurately portray the spatiotemporal patterns and magnitudes of understory temperatures19,34,35. We demonstrate that, although the average understory temperatures in tropical forests are often cooler compared to open-air measurements, the characteristics of these differences vary substantially across different continents, seasons, and time of the day. Temperature offsets, as well as their seasonal fluctuations, were less pronounced near the equator.

At night-time, understory temperatures were often warmer than the macroclimate in some regions. The presence of night-time warming in those regions (Fig. S2) is linked to the shortwave energy absorption by the canopy during the day-time, which is released in the form of longwave radiation at night. Higher heat capacity of forest biomass also helps to dissipate stored energy more slowly making understory warmer at night17,36,37. The more energy forests can store within their canopies, the stronger the night-time warming. Furthermore, the retention of surface emitted longwave radiation by forest canopy also contributes to night-time warming15. Nevertheless, major parts of tropical forest still demonstrate night-time cooling (Table S3) that is in-line with previous research findings13,17. Nocturnal transpiration38 affecting the night-time ambient energy balance39 is likely the main cause of the observed cooling at night. Studies have reported high nocturnal transpiration rate with increased soil moisture40,41. Our results, specially at high latitudes, show profound night-time cooling during wet seasons while understory observe warming during dry season nights (Table S3).

The day-time cooling inside forests can be attributed to the direct effect of biophysical factors on the partitioning of incoming solar radiation between latent and sensible heat32. The process of evapotranspiration (ET) transfers soil water into the atmosphere through the combined effects of plant transpiration and surface water evaporation. The evaporation of soil moisture absorbs the latent heat from the surrounding causing a local cooling effect under the canopy17,42,43. Studies have reported a positive relationship between leaf area index (LAI) and ET, ultimately affecting understory cooling44,45.

Our results showed stronger day-time negative offsets in regions experiencing well-defined dry seasons. These regions are mostly located above 5° in both directions away from the equator (Fig. S1, Table S2), for instance, the southern Amazon basin. Although dry seasons are characterized by lower rainfall water intake, the complex root system of tropical forests can access the deep soil water to maintain ET rates46,47,48. Hence, as macroclimate temperature during dry seasons tends to be higher, the offset in these areas is magnified.

The day-time/daily understory warming in certain regions was also observed during the dry season (Tables S1 and S2). However, these non-intuitive offsets could potentially result from the uncertainties present in the model’s input data. For instance, the macroclimate data used in this study is from ERA5-Land, a data source that inherits its own modelling uncertainties. To illustrate the uncertainty tied to ERA5-Land temperature dataset, we compared the monthly temperatures from weather stations with their corresponding ERA5-Land pixel values and reported the correlation and bias for each location (Fig. S12). Although a high correlation exists between weather station data and ERA5-Land data, overall, an underestimation of 1–2 °C is associated with the ERA5-Land temperature data. To overcome this limitation for local applications of the dataset, ground observations from weather stations could be used to bias-correct the open-air temperature from the reanalysis data and thus the temperature offset reported in our study (Fig. S13). Future studies with incorporation of new data from understory loggers installed in diverse conditions will also enable us to overcome the sparse ground data limitation present in this study (Table S5).

Our approach employed remote sensing data in combination with machine learning methods, which allowed us to quantify the importance of biophysical and climatic variables in governing the spatiotemporal behaviours of understory temperatures. Topography (elevation), canopy structure (LAI and Fraction of Absorbed Photosynthetically Active Radiation (FAPAR)), and open-air temperature emerged as the key climatic and biophysical variables controlling fine-scale variability of understory microclimates across the pan-tropics (Fig. S6). Elevation exhibited a negative relationship, while open-air temperature demonstrated a positive correlation with microclimate. Although it is expected that higher LAI/FAPAR values lead to lower understory temperatures49, we observed a positive relationship. This partial dependency of the model is merely due to the characteristics of the sample data used for training, which were all located in areas with high vegetation density. The empirical relationship between LAI/FAPAR and microclimate as reported by Hardwick et al.49, can only be achieved if sensors are installed at larger range of LAI conditions at similar elevations. Nevertheless, the ML approach adopted in the study was able to comprehend the overall importance of canopy structure variables while adapting to with the variables’ noncollinearity.

Our model was calibrated under certain biophysical conditions, and predictions outside these conditions are likely to contain higher uncertainties, as machine learning approaches are known to extrapolate estimates outside the boundaries provided during training. To minimize this problem, our training data was gathered considering a large range of geographical settings, across all continents assessed in this study, and different topographical gradients. For instance, our training data from East Africa included microclimate observations collected in tropical forests on Mt Kenya, over an elevational gradient from 1730 to 2450 m a.s.l. We also considered forests under different levels of disturbances, with sensors located in controlled fragmentation experiments in the Amazon50,51 and Southeast Asia52. Consequently, our estimates by a vast majority (e.g., 83% of the total pixels had a degree of interpolation ≥90%) were within the conditions represented in our training data, with only small areas, mainly high-elevation regions, being extrapolated (Fig. S9). As these uncertainties were quantified and mapped, the degree of interpolation can be used to mask or downweight pixels with larger uncertainties when using the provided maps in ecological applications (Fig. S9).

We have demonstrated through semivariogram analysis that microclimate-informed temperature datasets can unveil spatially independent and heterogeneous habitat conditions. The results of this study provide scientists with more reliable temperature data to support regional, continental, or global assessments in tropical forests. This is a crucial advancement in ecological and global change research as the discrepancies between macroclimate and microclimate temperatures can be substantial in the tropics, leading to biases and erroneous interpretations. For instance, microclimate-informed species distribution models28,29 have the potential to disclose more robust insights into the various processes underlying species vulnerability to climate change53. Climate change exposure can be buffered by microclimate, nonetheless, climate sensitivity can cause microclimate variations impacting the ability of species to cope with it54. Furthermore, microclimatic variations affect the spatial patterns of adaptive genetic variation and thus the ability of a population to survive climate change55,56. Microclimate also controls the seasonal movements of species within an ecosystem and thus directly impacts distribution capacity and populations especially in fragmented terrains57,58. Comprehending how these activities function with microclimate to shape species’ cohorts59 and their exposure to climate change is essential to forecasting range dynamics60,61.

Methods

Study region and temperature data

This study covered the global tropical region between 23° 27′ in the North to 23° 27′ in the South. Within this region, we collected microclimate temperature time-series data with 180 TOMST TMS (Temperature-Moisture-Sensor) dataloggers62 installed at various locations over the three continents with tropical forests (Fig. 1, Table S5). The temperature data used in this study spans a total of 8 years (i.e., from 2015 to 2022) but the duration of the records differs for each measurement location, ranging from a minimum of 8 months to a maximum of 26 months (Table S5). The TMS loggers are designed to record near-surface soil, surface, and air temperature (°C) every 15 min. In this study, we focused on air temperature measurements, which represent conditions at 15 cm above the ground62. The air temperature data from TMS loggers were averaged to hourly dataset in order to be consistent with the temporal resolution of ERA5-Land data. The logger data was converted from UTC to local time zones. The localized hourly mean temperature was then converted to monthly data by averaging (a) 24-h daily temperature (Tdaily), (b) day-time temperature (Tdt)—temperature records between 6:00 am to 6:00 pm local time, and (c) night-time temperature (Tnt)—temperature records between 6:00 pm to 6:00 am local time.

Explanatory variables

The biophysical variables to include in the modelling were selected based on their relevance to influence forest microclimate based on literature18,63, spatial resolution, and availability at global scale. In total, 9 biophysical variables (including climatic data) that cover topography, forest phenology and regional macroclimate were used in the study. Topographic layers were derived from a digital elevation model (DEM) of Shuttle Radar Topography Mission (SRTM) at 30 m spatial resolution. Three DEM-based topographic variables, i.e., slope (°), aspect (°) and elevation (m), were used in the model. Forest structural and functional attributes represented by LAI (-), FAPAR (-), and canopy height (CH) (m) were integrated in the model to encompass the forest cover interactions with incoming solar radiation. The LAI and FAPAR data were downloaded from the Copernicus Global Land Service (CGLS) at 300 m, and CH data developed by GLAD (The Global Land Analysis and Discovery) laboratory64 at a 25 m were used in this study. The LAI and FAPAR data are based on the observations from Sentinel-3 OLCI and PROBA-V satellites65 whereas CH is based on The Global Ecosystem Dynamics Investigation (GEDI) sensor onboard of the International Space Station. A nearest neighbour interpolation approach was used to harmonize the spatial resolution across different variables.

The hourly gridded data from ERA5-Land reanalysis (from 2000 to 2021) at spatial resolution of 0.1° × 0.1° were used as a macroclimate predictor. Three climatic variables, air temperature at 2 m above the land surface (°C), total precipitation (mm), and surface net solar radiation (J m−2) were used as model predictors. Twenty-two years average macroclimate conditions were used in modelling to account for inter-annual variabilities. All the hourly macroclimate variables were in UTC, which were converted to local time using longitudinal information. The local hourly macroclimate variables were then converted to monthly level data as per the average scheme of the microclimate data. Location of microclimate sensors were used to extract the information of biophysical and macroclimate predictors for the training of the machine learning model. The Tree Canopy Cover (TCC) version 4 for the year 2015 at 30 m was used to mask out non-forested area in the region66. A threshold of 40% TCC was used for masking the non-forest land. The overall flow diagram of material and method for estimating understory air temperature is shown in Fig. S3.

Thermal traits: offset, range and spatial heterogeneity

We generated monthly temperature offsets (ΔT) by using microclimate (i.e., understory air temperature at 15 cm above the ground modelled in this study) (Tunder) and macroclimate (i.e., open-air temperature at 2 m above ground provided by ERA5-Land reanalysis) (Topen) temperature measurements (ΔT = TunderTopen) in order to quantify the difference between microclimate and macroclimate across space and seasons. Positive ΔT values thus indicate warmer forest microclimate conditions, whereas negative values point to a colder forest microclimate. The ΔT was calculated at daily (ΔTdaily), day-time (ΔTdt), and night-time (ΔTnt) level for each month. To better understand the thermal ranges of understory environments, we also estimated the temperature range (RT) using Tdt and Tnt (RT = TdtTnt). The analysis of RT values will help us to better understand the thermal variations between day-time and night-time across space and seasons. For a detailed study of ΔT and RT across space and time, we divided ΔT datasets into three spatial scales based on elevation (i.e., low-elevation (0–500 m a.s.l.), mid-elevation (500–1000), and high-elevation (>1000)) and six scales based on latitude (i.e., low (0°–5° north and south), mid (5°–10° north and south), and high (>10° north and south) latitudes). In total, 18 spatial zones were generated for in-depth analysis of thermal traits (Fig. S14, Table S6).

Finally, a semivariogram analysis was performed on both modelled dataset Tunder and Topen provided by ERA5-Land reanalysis to quantify the spatial heterogeneity of each dataset. For this purpose, a 5° × 5° block near the equator was selected on each continent. The semivariogram analysis was done at a 10 km spatial resolution to keep the pixel size consistent across the datasets. The semivariogram analysis demonstrates the ability to reveal spatially independent thermal conditions by each dataset. As a pre-requisite to semivariogram analysis, detrending of the datasets was carried out by subtracting the best fit surface from the actual data67. A linear model was used to define the best fit surface for each dataset using the information of latitude, longitude, and temperature. This process of detrending was instrumental in addressing the dominant physical process that was evident in both datasets and predictably influenced the temperature values68. The detrending results of the two datasets (open-air and understory temperature) are shown in Fig. S4. The semivariogram analysis was performed on the residuals of both datasets. Furthermore, the behaviour of experimental/sample variograms were analysed using different combinations of distance and directions (Fig. S5). The semivariogram graph in Fig. 4 is based on 400 km distance in east-west direction (i.e., 90° as shown in Fig. S5). An exponential model curve was used to define the minimum-distance (also called range) and sill parameters which reflect the similarity/heterogeneity of a dataset69,70. The point on the fitted curve that corresponds to maximum semivariogram value define the sill (on the y-axis) and minimum-distance (on the x-axis) for each dataset (Fig. 4). The minimum-distance is a distance at which dataset in question become spatially independent71.

Machine learning model

The aim of the study was to maximize the predictive capacity within the biophysical domain covered by the training data. We selected a machine learning (ML) approach as it offers more predictive power compared to other statistical models such as generalized linear models (GLMs) or generalized additive models (GAMs), which are more efficient in exploring predictor inferences72. We used a bootstrap aggregating (bagging) regression approach to model understory temperature using the nine macroclimatic and biophysical predictors. The bagging regression model randomly ensembles multiple sets of weak learners and datasets to train the learners in parallel. The model response for new data is generated by aggregating predictions from each weak learner in the ensemble73. The bagging algorithm works to minimize the variance and avoid overfitting. It is less prone to outliers and capable to uncover nonlinear/complex relationships of predictors with the response variable (Fig. S6) and can also handle multicollinearity among the predictors (Fig. S7). A 5-fold cross-validation was used to train and test the model performance. Three hyperparameters of ML model, namely (a) minimum leaf size (2–8), (b) number of learners (10–500), and (c) number of predictors to the sample (1–8) were optimized using a grid search approach74. A separate ML regression model was ensembled to estimate understory temperature for each temporal scale (i.e., mean monthly daily, mean monthly day-time, and mean monthly night-time).

Spatial evaluation of model

The ML models are known to be less accurate in extrapolating beyond the boundaries set by the training datasets75 and should be quantified to indicate model’s spatial certainty76. Generally, ML models when applied at large spatial scales, are expected to encounter input data that fall beyond the spatial extent encompassed by the training data. In such a situation, a fraction of predictions may fall under the category of the model’s extrapolation. To quantify the model spatial certainty, we performed a spatial assessment out-lined by van den Hoogen et al.76, that quantifies the degree of interpolation and extrapolation at pixel-level. This assessment was done at the monthly level. It helped us to identify the regions that fell outside the bounds of the training data. For model’s spatial assessment, monthly-level training data points and the pixels of composite raster were transformed into the same Principal Component (PC) space76. Based on our dataset, the first 6 PC axes explained ~93% of the data variation. By combining these 6 PC axes, a total of 15 bivariate spaces were generated; the combinations of these bivariate spaces were as follows: PC1 × PC2, PC1 × PC3, PC1 × PC4, …, PC5 × PC6. For each of these 15 combinations, every pixel in the composite raster (Fig. S8a) was scored as one if it fell within, or zero if it fell outside the convex hull enclosing the training dataset within that PC combination space (Fig. S8b, c). Pixels falling inside the convex hull were classified as interpolated (Fig. S8d, red points), pixels outside the convex hull were classified as extrapolated (Fig. S8d, blue points). The average of all 15 combinations was taken to quantify the degree of interpolation for each month. At the end, 12 maps of monthly-level spatial extents of interpolation/extrapolation were averaged to present an overall picture of each model’s spatial accuracy (Fig. S9).

Finally, to cater the possibility of exaggerated model accuracy because of spatial autocorrelation, we performed a spatial leave-one-out cross-validation analysis to reflect more conservative accuracy parameters for each ML model76,77. Under this analysis, a test location was selected, and a buffer zone was established around it. The data points that fall outside of the predefined buffer radius were used to train the model and the test location was used to validate the model prediction. This was repeated for each of the 180 TMS data points. Because of expected spatial autocorrelation close to the validation point, this process was repeated with an increasing buffer zone around the validation point, each time removing data points that fell within the defined buffer zone from the training data. This method allowed assessing the influence of spatial autocorrelation on the evaluation parameters of each model. The stabilizing of accuracy parameters with increasing buffer radius was an indication of more stable/robust model accuracy indicators (Fig. S10). In addition to above mentioned model evaluation approaches, an independent validation of modelled Tdaily, Tdt, and Tnt was carried out by comparing the results with new independent ground measurements. More details of the data used for blind validation are provided in the supplementary text.