Patterns of tropical forest understory temperatures

Temperature is a fundamental driver of species distribution and ecosystem functioning. Yet, our knowledge of the microclimatic conditions experienced by organisms inside tropical forests remains limited. This is because ecological studies often rely on coarse-gridded temperature estimates representing the conditions at 2 m height in an open-air environment (i.e., macroclimate). In this study, we present a high-resolution pantropical estimate of near-ground (15 cm above the surface) temperatures inside forests. We quantify diurnal and seasonal variability, thus revealing both spatial and temporal microclimate patterns. We find that on average, understory near-ground temperatures are 1.6 °C cooler than the open-air temperatures. The diurnal temperature range is on average 1.7 °C lower inside the forests, in comparison to open-air conditions. More importantly, we demonstrate a substantial spatial variability in the microclimate characteristics of tropical forests. This variability is regulated by a combination of large-scale climate conditions, vegetation structure and topography, and hence could not be captured by existing macroclimate grids. Our results thus contribute to quantifying the actual thermal ranges experienced by organisms inside tropical forests and provide new insights into how these limits may be affected by climate change and ecosystem disturbances.


Model evaluation
The bagging regression models omit on average 37% of the observations for each decision tree while generating bootstrap replicas of the dataset.These observations are called out-ofbag (OOB) observation and are used to evaluate the model performance.Our bootstrapped regression models for estimating monthly understory air temperature at daily average (Tdaily), day-time (Tdt), and night-time (Tnt) durations performed well with coefficients of determination (R 2 ) ranging from 0.93 to 0.96, root mean square error (RMSE) from 0.48 to 0.65 °C and mean absolute errors from 0.75 to 1.0 °C.The predictive performance of ML regression model for Tdaily is better compared to other models with Tdt model being the least accurate.In addition, spatial leave-one-out cross-validation analysis also gives more conservative estimates of the model used in this study (Fig. S10).The spatial leave-one-out cross-validation also demonstrated good prediction efficiency with accuracy parameters stabilizing after 40 km.More conservative estimates of each model performance and its variation along the buffer size can be seen in Fig. S10.

Independent validation of modelling results
We received additional data from four TOMST TMS logger installed in the Congo basin.These temperature observations were not used in training or testing of the model.We are using this new ground data for an independent assessment of the modelling results (i.e., Tdaily, Tdt, and Tnt).Overall, we observed an R 2 ≥0.53,RMSE≤0.72 °C, and MAE≤0.79 °C values for all three datasets.The scatter plot comparison of actual and modelled temperature is shown in Fig. S11.Important to note is that the values of RMSE and MAE are well within the range depicted in Fig. S10.However, for these validation points, we observed an overall overestimation of 0.78±0.03°C.

Supplementary Figures
Fig. S1: Spatial and temporal variation of day-time temperature offsets.The offset (ΔTdt) was calculated by subtracting open-air temperature (i.e., ERA5-Land) from modelled understory air temperature.Panels a-f show the pixel-level variations of ΔTdt for two months of the year.To present monthly variation, six locations (each of the size 1°x1° area) were randomly selected on both sides of the equator in South America (a), Africa (b), and Southeast Asia (c).Panels g-l depict the intra-annual fluctuations of ΔTdt at the selected locations.Panels g-i represent monthly ΔTdt variation for the selected locations in the North while panels j-l represent ΔTdt variation in the South.In line graphs, the shaded region around the solid (red) line represents the spatial variation within the selected block of each location.Fig. S10: Spatial leave-one-out cross-validation.This approach each time validates a model on data from one distinct location and trains a model on the remaining data.This is repeatedly done for each of our 180 locations.Because of potential spatial autocorrelation close to the validation location, this process is repeated with an increasing buffer around the validation location, each time excluding data points that fall within the defined buffer zone from the training data.This method allows assessing if the validation parameters stabilize, an indication of limited spatial autocorrelation.For this comparision, we selected active weather stations from the Global Historical Climatology Network -Daily (GHCN-Daily) 2 and employed the data between our study period (2000-2021).The weather station data were averaged to monthly scale and compared with corresponding ERA5-Land pixel temperatures.A total of 432 weather stations data from across tropics were used in this comparison.Fig. S12a shows the spatial distribution of statistically signification (p<0.05)value of Pearson correlation (r).Only 10 reanalysis pixels showed r<=0.5.Moreover, for each location, pixel-wise biases were calculated, and the spatial variability is shown in Fig. S12b.Overall, ERA5-Land temperature is underestimated by 1-2 °C.
The mean bias was calculated using the formula:  = ∑  −  where  is the total observations of ERA5-Land ( ) weather station temperature ( ).
It's important to remember that weather station data has its own set of constraints.Even if we assume that weather station data is free from systematic and reporting errors (which likely is not the case for GHCN-Daily weather data as reported by Menne et al., 79 ), a weather station only records the weather conditions of a specific environment where it is installed.The stations used in our comparative analysis are mostly situated in urban areas.On the other hand, the ERA5-Land pixel depicts the average climate conditions across a 10 × 10 km area.We have employed a simple delta change approach 3,4 to bias-correct the ERA5-Land data and the temperature offset magnitudes.More detailed monthly-level correction using other approaches like multiple linear regression or quantile mapping approach can also be adopted.
To illustrate the example, we have used seven weather stations data from all those countries where TMS loggers were installed (Fig. S13a).Monthly average temperatures 5 were used to compare with corresponding ERA5-Land pixel and the mean bias (change factor) was calculated (Fig. S13b).The biased ERA5-Land temperature was adjusted using the calculated mean bias to estimate the bias-corrected reanalyzed temperatures (Fig. S13c).For example, the ERA5-Land pixel values in Mexico (MX) (Fig. S12b) were on average 1.07 (°C) lower when compared with weather station data.Therefore, 1.07 was added to the uncorrected ERA5-Land pixel values (as shown in red color line Fig.S13c) to get bias-corrected ERA5-Land temperature (i.e., green line, Fig. S13c).This corrected ERA5-Land temperature was then used to calculate actual temperature offsets as shown in Fig. S13d.The understory temperature shown in Fig. S13c is from a nearby forest area.
The estimated mean bias can also be directly added to the uncorrected temperature offsets (i.e., reported in this study) to derive bias-corrected temperature offsets.For example, estimated mean bias for MX is -1.07.This bias can be directly added to uncorrected temperature offsets (orange line, Fig. S13d) to get bias-corrected temperature offsets (lightgreen line, Fig. S13d).
Given our limited access to worldwide weather station data, especially the lack of access to average day-time and night-time temperature conditions, the end-users of the provided understory temperatures can perform local bias-correction using their national or regional weather station archives.S6.
Supplementary Tables Table S1.The intra-annual variations of daily temperature offsets (ΔTdaily) (°C) for each spatial zone in three continents.For details of spatial zones, see

Fig. S2 :
Fig. S2: Spatial and temporal variation of night-time temperature offset.The offset (ΔTnt) was calculated by subtracting night-time open-air temperature (i.e., ERA5-Land) from modelled understory air temperature.Panels a-f show the pixel-level variations of ΔTnt for two months of the year.To present monthly variation, six locations (each of the size 1°x1° area) were randomly selected on both sides of the equator in South America (a), Africa (b), and Southeast Asia (c).Panels g-l depict the intra-annual fluctuations of ΔTnt at the selected locations.Panels g-i represent monthly ΔTnt variation for the selected locations in the North while panels j-l represent ΔTnt variation in the South.In line graphs, the shaded region around the solid (red) line represents the spatial variation within the selected block of each location.

Fig. S4 :
Fig. S4: Detrending process for semivariogram analysis.Panel a and b are for open-air and understory temperatures, respectively.The detrending was done by subtracting best fit surface from actual temperature data.Top row in each panel represents detrending of selected area from Africa, middle row represents area in South America, and last row represents area in Southeast Asia.The detrending of temperature surface generate residual data that was further used in semivariogram analysis.

Fig. S5 :
Fig. S5: Sample variogram behaviour under different combinations of distance and direction.Panel a and b show the semivariogram (ɣ) values for open-air and understory data under different combinations of distance and directions, respectively.The ɣ values for selected areas of Africa, South America and Southeast Asia are shown in top, middle and last rows of each panel, respectively.Horizontal direction (east-west) is depicted with 90° whereas 0° represents vertical (north-south) direction.

Fig. S6 :
Fig. S6: Partial dependence plot.The plots in panel a indicate how much the response variable is affected by a certain predictor value, after accounting for the average effects of all other variables in the model.These partial dependence plots are for the model used for simulating day-time understory air temperature.The relative importance of each predictor in the model is reported between brackets.Gray backgrounds depict the distribution of the sample plots along that variable.Overall predictor's importance in each model is shown in panel b.

Fig. S7 :
Fig. S7: Correlation matrix of explanatory variables.The correlation matrix containing all Pearson correlation coefficients (r) between all combinations of predictor variables.Note that the used machine learning model based on random forest is capable of handling multicollinearity between predictor variables.

Fig. S8 :
Fig. S8: Classification strategy for spatially explict model error.The visual representation of the adopted interpolation-extrapolation classification strategy of composite pixels in semimultivariate space.After transforming predictors pixel values (a) and training data (b) into principle component (PC) space.Composite pixels falling within the convex hull encompassing the sampled data (c) are classified as interpolated, those outside as extrapolated (d).This process is repeated for all bivariate combinations of the selected PC axes.The final image is the proportion of covariates that are classified as interpolated.In our study dataset, first six PC axes explained >93% of variation represented by the dataset.

Fig. S9 :
Fig. S9: Spatially explicit model errors.The pixel values closer to one indicate that values of predictor variables fell within the range of data covered by the in-situ measurement locations used to train the model.Low values indicate that the model extrapolated for many of the covariates at the specific location, as predictors values were not covered in the training dataset.

Fig. S11 :
Fig. S11: Independent validation of modelling results.Panel a represents the locations of regions from where the microclimate sensors' data were used in modelling pantropical microclimate dataset (green points) and the region used for independent validation of the modelling results (yellow points).Panel b represents the monthly-level scatter plot comparision of understory measured and simulated temperature for four loggers A, B, C, and D, respectively.

Fig. S12 :
Fig. S12: Comparison of monthly temperatures from weather stations and their corresponding ERA5-Land pixel.

Fig. S14 .
Fig. S14.Map showing the extent of different spatial zones use in this study to thoroughly examine modelling results.The description of each zone number can be found in TableS6.

Table S2 .
Table S6 and Fig. S14.The positive offset values indicating warmer understory climate are in red and nagative offset values are in black.The intra-annual variations of day-time temperature offset (ΔTdt) (°C) for each spatial zone in three continents.The positive offset values indicating warmer understory climate are in red and nagative offset values are in black.

Table S3 .
The intra-annual variations of night-time temperature offset (ΔTnt) (°C) for each spatial zone in three continents.The positive offset values indicating warmer understory climate are in red and nagative offset values are in black.

Table S4 .
The intra-annual variations of temperature range RT for each spatial zone in the three continents.RT values larger than 3 °C are highlighted in red.

Table S5 .
Number of TMS data loggers installed in each country and the duration of their temperature records used in this study.

Table S6 .
The definition of each spatial zone and its characteristics.It also depicts the forest area percentage within each zone of the respective continent.Spatial extent of each zone is shown in Fig.S14.