The global biogeography of tree leaf form and habit

Understanding what controls global leaf type variation in trees is crucial for comprehending their role in terrestrial ecosystems, including carbon, water and nutrient dynamics. Yet our understanding of the factors influencing forest leaf types remains incomplete, leaving us uncertain about the global proportions of needle-leaved, broadleaved, evergreen and deciduous trees. To address these gaps, we conducted a global, ground-sourced assessment of forest leaf-type variation by integrating forest inventory data with comprehensive leaf form (broadleaf vs needle-leaf) and habit (evergreen vs deciduous) records. We found that global variation in leaf habit is primarily driven by isothermality and soil characteristics, while leaf form is predominantly driven by temperature. Given these relationships, we estimate that 38% of global tree individuals are needle-leaved evergreen, 29% are broadleaved evergreen, 27% are broadleaved deciduous and 5% are needle-leaved deciduous. The aboveground biomass distribution among these tree types is approximately 21% (126.4 Gt), 54% (335.7 Gt), 22% (136.2 Gt) and 3% (18.7 Gt), respectively. We further project that, depending on future emissions pathways, 17–34% of forested areas will experience climate conditions by the end of the century that currently support a different forest type, highlighting the intensification of climatic stress on existing forests. By quantifying the distribution of tree leaf types and their corresponding biomass, and identifying regions where climate change will exert greatest pressure on current leaf types, our results can help improve predictions of future terrestrial ecosystem functioning and carbon cycling.


The global biogeography of tree leaf form and habit
Understanding what controls global leaf type variation in trees is crucial for comprehending their role in terrestrial ecosystems, including carbon, water and nutrient dynamics.Yet our understanding of the factors influencing forest leaf types remains incomplete, leaving us uncertain about the global proportions of needle-leaved, broadleaved, evergreen and deciduous trees.To address these gaps, we conducted a global, ground-sourced assessment of forest leaf-type variation by integrating forest inventory data with comprehensive leaf form (broadleaf vs needle-leaf) and habit (evergreen vs deciduous) records.We found that global variation in leaf habit is primarily driven by isothermality and soil characteristics, while leaf form is predominantly driven by temperature.Given these relationships, we estimate that 38% of global tree individuals are needle-leaved evergreen, 29% are broadleaved evergreen, 27% are broadleaved deciduous and 5% are needle-leaved deciduous.The aboveground biomass distribution among these tree types is approximately 21% (126.4Gt), 54% (335.7 Gt), 22% (136.2Gt) and 3% (18.7 Gt), respectively.We further project that, depending on future emissions pathways, 17-34% of forested areas will experience climate conditions by the end of the century that currently support a different forest type, highlighting the intensification of climatic stress on existing forests.By quantifying the distribution of tree leaf types and their corresponding biomass, and identifying regions where climate change will exert greatest pressure on current leaf types, our results can help improve predictions of future terrestrial ecosystem functioning and carbon cycling.
Forest ecosystems, which contain 80-90% of global terrestrial plant biomass 1,2 and a large proportion of terrestrial biodiversity 3 , regulate global biogeochemical cycles, and provide critical ecosystem services 4 .Leaves mediate forest energy and carbon inputs via photosynthesis, respiration, transpiration 5,6 and litterfall 7,8 , thereby regulating ecosystem structure and function, and water, nutrient and carbon cycles [9][10][11] .Leaves of trees are highly diverse but can be broadly classified into four major types on the basis of leaf habit (evergreen vs deciduous) and form (broadleaved vs needle-leaved).These characteristics are linked to a vast array of functional traits associated with resource-use strategies and strongly depend on local growing conditions [12][13][14][15] .Therefore, understanding variation in leaf types along environmental gradients is critical to predicting global biogeochemical cycles and ecosystem functioning in a changing world.Yet, we still lack a global, quantitative understanding of forest leaf habit and form, informed by field-based observations.Deciduous tree species evolved to tolerate seasonal climates and maximize the use of a short growing season 16,17 .They usually have higher photosynthetic rates 18 than evergreen species and reduce transpiratory water loss due to respiration by shedding their leaves during unfavourable seasons 11 .Evergreen trees with longer leaf lifespans, by contrast, tend to have greater leaf construction costs 19 and lower nutrient cycling rates 20 .Growing season water-use strategy commonly differs between broadleaved and needle-leaved species 21 , with needle-leaved species Letter https://doi.org/10.1038/s41477-023-01543-5trees, 7% contain exclusively broadleaved deciduous trees, 12% contain exclusively needle-leaved evergreen trees, and the remaining 68% contain a mixture of leaf habits and forms (Fig. 1b).
Our random forest model predicted 75% of the global variation of forest leaf-type classes (10-fold cross-validation R 2 BC , see Methods).We also ran spatially buffered leave-one-out cross-validation (LOO-CV) to account for the potential effect of spatial autocorrelation on model evaluation statistics, which resulted in a coefficient of determination (R 2 ) of 0.56 at a buffer radius of 300 km (see Methods and Supplementary Fig. 1).Within each class, our model explained 90%, 59%, 75% and 29% (10-fold cross-validation R 2 ) of the global variation in the proportion of broadleaved evergreen, broadleaved deciduous, needle-leaved evergreen and needle-leaved deciduous trees within forests, respectively.These predictive relationships were then used to upscale the observations across the global extent of forest coverage (Fig. 2).
To evaluate model robustness, we tested its performance on an independent validation dataset containing 3,895 sites across the globe 40 , resulting in an R 2 of 0.47 (see Methods section 'Cross-validation using external data').In addition, we compared our model output with annual land cover maps from the European Space Agency's Climate Change Initiative (ESA CCI LC) 41 .Across all leaf types, our model showed high correlations with the ESA CCI LC outputs, with an R 2 BC of 0.61.Within each leaf type class, our model explained 78% of the variation in the proportion of broadleaved evergreen trees, 31% of broadleaved deciduous tree proportions, 64% of needle-leaved evergreen tree proportions and 19% of needle-leaved deciduous tree proportions (R 2 ).
While considerable uncertainties exist for individual predictions at the pixel level, these uncertainties rapidly decrease as the model is projected to a larger area (<5% at 250 km 2 ; Supplementary Fig. 2).Our model shows high confidence in tropical and boreal forests, whereas predictive confidence is lower in mixed forests and ecotones between different types of forest (Supplementary Figs. 3 and 4).For example, models for broadleaved evergreen and deciduous species share low predictive confidence in central African savanna regions.Similarly, low predictive confidence can be found across eastern Russian mixed forests.The low predictive confidence for these regions can be attributed to low sample size and mixed occurrence of broadleaved evergreen and deciduous species, as well as differences in the year of observation across forest survey plots, which may lead to larger uncertainties in ecotones where forest types can shift in relatively short time periods.

Global environmental drivers of forest leaf-type variation
To assess the relative importance of environmental features on variation in leaf types, we ran random forest models including a range of environmental variables (see Methods).We combined these environmental factors into three groups (climate, soil and topography).
often showing conservative strategies 22 , such as lower stomatal conductance 23 and higher hydraulic safety margins 24 , resulting in low photosynthesis rates 25 .A spatially explicit understanding of tree leaf types is therefore critical for estimating the sensitivity and resilience of forests to future climate and soil conditions [26][27][28] , and understanding the ecological consequences of such changes.
Theoretical models 29 and remote sensing observations 30,31 have shown general trends in how climate and soil conditions affect the geographic occurrence of broadleaved and needle-leaved tree species at regional and global scales 32 .These relationships form the foundation of dynamic global vegetation models 31,[33][34][35][36][37] .Yet, the relative importance of various environmental characteristics on leaf habit and form remains to be determined.Furthermore, until now, these vegetation models have lacked the ground data needed to build tree-density-based 'bottom-up' models.Such models are crucial for validating satellite-derived trends on a global scale and for providing a comprehensive, high-resolution depiction of forest leaf-type variation across environmental gradients.
Here we analyse the global distribution of needle-leaved, broadleaved, evergreen and deciduous tree species, by integrating ground-sourced information from 9,781 standardized forest inventory plots in the Global Forest Biodiversity initiative (GFBi) 38 database with species-level leaf habit (evergreen vs deciduous) and leaf form (broadleaf vs needle-leaf) information accessed from the TRY plant trait database 39 (Fig. 1a).The 9,781 forest inventory plots represented a subsample of the full GFBi data of >1 million records to ensure an equal representation of different forest biomes across the globe and avoid modelling bias caused by uneven spatial sampling (see Methods).Using information on both the occurrences of individual trees per plot and the basal-area weighted occurrences of each individual, we calculated plot-level leaf-type proportions (1) on the basis of the leaf type of each individual tree (tree-based leaf type) and ( 2) by weighting each tree by its basal area (area-based leaf type) (see Methods section 'Tree leaf-type data').The first estimate allowed us to map the tree densities of each leaf type, while the second estimate allowed us to map the leaf types represented by the largest trees in a plot.To interpolate patterns across the globe, we combined our plot-level forest leaf information with 58 environmental variables, representing vegetation characteristics, climate, topography, vegetation, soil conditions and human-related features.We also tested the relative importance of 29 commonly studied variables on leaf-type variation and characterized the relationships between environmental features and leaf type.

Mapping global forest leaf types
To characterize the variation in forest leaf type, we first summarized the proportion of evergreen vs deciduous (leaf habit) and broadleaved vs needle-leaved (leaf form) individuals within each plot (Fig. 1a).Across all 9,781 forest plots, 13% contain exclusively broadleaved evergreen To test for the relative importance of climate, soil and topographic characteristics, we ran a principal component analysis (PCA) for each of these variable groups and selected the first six principal components, which explained ≥90% of the total variation across all variables.Our analyses show that climate and soil characteristics jointly determine the global leaf-type distribution (Fig. 3a).With respect to variation in leaf habit, temperature fluctuations, that is, isothermality and temperature seasonality, were the most important variables (Fig. 3c).Yet, the entire combination of soil features (first six principal components of soil variables) was as important as climate for predicting leaf habit in our random forest model (Fig. 3a), suggesting that soil characteristics play an important role in the global distribution of evergreen 'vs' deciduous species.Especially soil texture, in combination with pH, appears to affect global variation in tree leaf habit.Acidic soils, commonly found in tropical and boreal regions, inhibit nutrient (N and P) supply by reducing cation availability and limiting tree growth rates 16 .This might explain why broadleaved deciduous species that require high nutrient supply are less abundant in regions with acidic soil.Broadleaved evergreen species, by contrast, may better cope with nutrient poor, acidic soils 42 .Similarly, needle-leaved evergreen species that can maintain growth even under low nutrient supply are more abundant in regions with acidic soil 16 .The high tannin and phenol contents of needle leaves further contribute to the acidification of soils 16 , probably creating a positive feedback towards coniferous dominance.Overall, our results point towards a feedback between tree leaf habits and soil conditions, highlighting the connection between physical soil features and soil water 43 and nutrient 44 availability.Variation in leaf form was best predicted by climate variables (Fig. 3b), with the most important variable being temperature of the coldest quarter (Fig. 3d).By contrast to leaf habit, soil, topographic and vegetation features were less important in driving variation in leaf form, indicating adaptation to extreme climates, cold winters or Letter https://doi.org/10.1038/s41477-023-01543-5extended dry seasons, as a major determinant of leaf form.This supports previous studies indicating that diverse leaf forms evolved to adapt to different climates 45,46 .

Computation of tree densities with different leaf types
To quantify the proportions of different leaf types across global forests, we combined a global tree density distribution map 47 with our individual-based leaf type models (see Methods and Supplementary Fig. 5).At the global scale, we estimate that of the ∼3 trillion adult trees presently existing, 29.1% (95% CI = 27.5-30.7%)are broadleaved evergreen, 27.1% (23.8-30.6%)are broadleaved deciduous, 38.4% (35.2-41.6%)are needle-leaved evergreen and 5.4% (4.3-6.6%) are needle-leaved deciduous (Fig. 4 and Supplementary Fig. 6).Even though needle-leaved tree species comprise less than 2% of the world's estimated 73,000 tree species 48 , this small fraction of diversity represents around 44% of individual trees on Earth.Of the 1.15 trillion needle-leaved evergreen trees growing worldwide, the majority (64.7%) are found in boreal regions, while 23.1%, 8.9%, 2.8% are found in temperate, arid and tropical regions, respectively (Fig. 4).In contrast, of the 0.87 trillion broadleaved evergreen trees growing worldwide, the majority exists in tropical (57.2%) and arid (29.8%) regions, with 6.3% and 4.3% existing in temperate and boreal regions, respectively (Fig. 4).Broadleaved deciduous trees show the widest range of occurrences.Of the 0.81 trillion broadleaved deciduous trees, 34.5% are found in boreal regions, 28.8% in temperate regions, 22.5% in arid regions and 11.5% in tropical regions (Fig. 4).We further estimate that there are 0.16 trillion needle-leaved deciduous trees across the globe, the vast majority of which grow in boreal regions.
Using our basal-area-based model of leaf types, we were able to estimate the biomass contribution of each leaf type within individual forest pixels by integrating our data with a recently published aboveground biomass map 49 .Our analysis revealed that broadleaved evergreen trees store the largest proportion of global forest biomass, accounting for 54.4% (335.7 Gt) out of the total biomass of 617 Gt.Broadleaved deciduous trees contribute 22.1% (136.2Gt), needle-leaved evergreen trees contribute 20.5% (126.4Gt) and needle-leaved deciduous trees contribute 4% (18.7 Gt) (Supplementary Fig. 7).Interestingly, despite there being 42% more evergreen needle-leaved trees compared with broadleaved evergreen trees, their biomass contribution is 62% (209.3Gt) lower.This distribution of biomass across different leaf types provides valuable insights into the carbon storage capacity of diverse forest ecosystems.

Climatic risk assessment of future leaf types
Climate change will strongly affect the functioning of terrestrial ecosystems by altering growth, mortality and reproduction of trees and their interactions with leaf form and habit 16 .Our models allowed us to highlight areas of potential risk by identifying the regions where future climates will shift to conditions that currently support leaf types different from the prevailing ones.In these regions, trees are likely to experience more climatic stress in the future.To assess the  Letter https://doi.org/10.1038/s41477-023-01543-5 extent and distribution of future changes in forest leaf-type climate envelopes, we projected our leaf-type models into the future using three climate change scenarios (low-emission scenario (SSP1-RCP2.6),business-as-usual scenario (SSP3-RCP7) and high-emission scenario (SSP5-RCP8.5)) 50.To do so, we used our random forest models of present leaf type distributions and replaced all climate variables reflecting the 1979-2013 climate (see Supplementary Fig. 1) with climate model projections for 2071-2100 while keeping soil, topographic, vegetative and anthropogenic characteristics constant.The results suggest that forests will experience substantial shifts in leaf-type climate conditions.Depending on future emissions pathways, 17 to 34% of future forested regions are projected to experience a climate by the end of the century that currently supports leaf types different from the prevailing ones (Fig. 5 and Supplementary Fig. 8; see Supplementary Fig. 9 for an alternative definition of forest types).The climate conditions that have historically supported evergreen forests are declining as global conditions shift towards those that have historically supported more deciduous forests, and this appears to be the case for both broadleaved and needle-leaved species (Supplementary Fig. 10).Specifically, 7-20% of the broadleaved evergreen forests are likely to experience changes towards conditions that currently support deciduous species (Fig. 5).Similarly, 29-67% of the needle-leaved evergreen forests will experience changes towards climate conditions that currently support mixed or deciduous forests (Fig. 5 and Supplementary Figs.8-12).If these climate projections are realized, plants in those regions must either tolerate more stressful environmental conditions or shift their distributions, causing changes in forest composition 51 .Previous studies predicting ecoregion shifts have also shown a heightened vulnerability to changing climate conditions, surpassing even the susceptibility of leaf types 52 .
It is important to acknowledge that linking forest leaf types to climate alone cannot fully capture the complex interactions of other influential factors, including CO 2 concentration 53,54 and nutrient availability 26 .Consequently, the analysis presented in this study does not project actual changes in forest leaf types.Instead, its focus is on identifying regions where future climates will shift to conditions that currently support different leaf types than what is currently observed.Different species may exhibit varying tolerance thresholds and responses to CO 2 fertilization, leading to divergent outcomes.Moreover, it is possible that species sharing the same leaf habit or form but with broader climatic tolerance ranges could replace the present species, potentially mitigating leaf-type changes.
Our analysis serves as a risk assessment, highlighting regions where climate poses a potential threat to the existing forest composition.Further research is necessary to gain a deeper understanding of the intricate interactions between climatic changes, elevated atmospheric CO 2 concentrations and nutrient availability.These interactions play a pivotal role in determining essential aspects of forest ecology, such as germination rates, seedling survival, growth rates and tree mortality, which ultimately shape forest composition.Nonetheless, the analysis underscores the substantial and rapid changes in climatic conditions that forests are already experiencing and will continue to experience even more profoundly in the future, on timescales ranging from decades to centuries.

Methodological considerations
Our models successfully captured a substantial portion of the observed spatial variation in forest leaf types, with an overall explained variation of 75%.However, the accuracy of our predictions varied across different leaf types.Among them, our model achieved the highest accuracy in predicting the spatial distribution of broadleaved evergreen tree species, with an explained variation of 90%.Conversely, our model explained only 29% of the variation in needle-leaved deciduous trees, which can be attributed to the limited availability of data from plots containing needle-leaved deciduous species.To address the uncertainties associated with our predictions, we employed a subsampling approach, running 100 independent models.This allowed us to assess the range of variability in our predictions and identify areas with lower predictive confidence.The resulting maps of model uncertainty (Supplementary Fig. 3) highlight regions where caution should be exercised when interpreting our predictions.

Broadleaved
While our study supports many of the mechanisms identified in previous research, our correlative analyses do not necessarily establish causal relationships.To map leaf-ype variation across the globe on the basis of the relationships with environmental features, we used global layers.This approach was necessary as point-level predictor variables cannot be used to interpolate predictions across the globe.While the majority of these layers effectively capture local variations, soil layers may not fully reflect point-level soil observations due to the inherent spatial heterogeneity of soil conditions.In addition, soil layers are typically derived from interpolation methods using environmental information and may thus be correlated with climate variables.While our random forest model predictions are not affected by multicollinearity, this could impact the quantification of variable importance related to environmental, soil, topographic and vegetation features.
Incorporating local point observations into the model training was not feasible because the ground-based forest inventory data we used did not include field-measured environmental and soil characteristics.
Although we examined additional point-level observations from the World Soil Information Service (WOSIS) dataset (see next paragraph), these observations often did not align spatially with the forest inventory plots, resulting in a reduced sample size (>80% reduction) and geographic bias (Fig. 1 and Supplementary Fig. 14).Moreover, training the model using point observations while relying on soil layers for model prediction could introduce further uncertainties and biases.To avoid these limitations, we conducted both model training and predictions using the same soil layers.
Nevertheless, we conducted analyses to determine the agreement between results based on global soil layers and point-level soil observations.We matched our forest inventory dataset 55 with the WOSIS dataset 56 , which contains local point observations of soil features.These analyses indicated that: (1) model predictions remained consistent when using point observations instead of global layers of soil characteristics (97-99% similarity in predictions), (2) the global soil layers exhibited good agreement with point observations for most soil variables, particularly the four most important variables (R 2 = 0.42-0.62;Fig. 3c and Supplementary Fig. 13) and (3) the inferred importance of soil features in leaf type variation remained similar (<5% difference) when using point observations instead of soil layers as predictors (Supplementary Fig. 14).These analyses underscore the crucial role of soil features in determining global leaf-type variation.Letter https://doi.org/10.1038/s41477-023-01543-5

Conclusions
By characterizing the environmental controls of forest leaf-type variation and integrating this information with tree density data, our analysis reveals that 38% of global tree individuals are needle-leaved evergreen, 29% are broadleaved evergreen, 27% are broadleaved deciduous and 5% are needle-leaved deciduous.We further show predictable global patterns in forest leaf varieties, with the relative abundance of leaf forms mainly correlating with temperature variation.In contrast, leaf habit is jointly controlled by climate and soil characteristics.These global relationships between environmental factors and forest leaf types largely agree with local experimental and modelling studies [57][58][59] , which also highlight the dual role of climate and soil conditions in driving variation in forest types.Our analysis of the spatial variation of forest leaf types refines our understanding of forest composition and structure 60,61 at a global scale.While satellite-derived approaches have been foundational for the characterization of spatial variation in canopy structure 62 , our bottom-up model, derived from empirical inventory data, allowed us to create models of forest leaf type at the individual tree level to provide novel insights into forest composition.This can help benchmark satellite-derived models of forest structure and inform ecological models of plant productivity, biogeochemical cycling, carbon storage and species distribution 31,63,64 .By identifying the main environmental characteristics that determine habitat suitability, such as annual mean temperature, climate seasonality and water and nutrient supplies, our baseline estimates of leaf-type densities are also critical for projecting population-and community-level tree demographics under current and future climate change.Ultimately, these insights can help make informed decisions to guide global efforts to conserve, restore and sustainably manage forest ecosystems that are so vital for the wellbeing of all organisms on Earth.

Data collection
Forest inventory data.To obtain empirical information on tree occurrences, we extracted a total of 1.1 million forest inventory plots with more than 50 million individual occurrence records, covering all continents except Antarctica from the GFBi dataset 38 .To avoid modelling bias caused by uneven spatial sampling across biomes, we downsampled the dataset so that the relative proportion of plots in each biome in the dataset approximately matched the proportion of forested area within each biome 65 .This was done by retaining all tropical forest plots in GFBi (n = 11,367) and randomly downsampling the remaining biomes in proportion to this number.Individuals with stem diameters <10 cm were excluded as the focus was on adult trees, and only plots with ≥10 adult individuals were included in the final analysis.For each plot, the dataset contains information on the location (coordinates), the year(s) when the inventory took place, stem diameter at breast height (DBH) and species identity of each individual.For plots with time series data, only the most recent observation year was included in the analysis.The average year of observation across all plots was 2005.This resulted in a total of 9,781 plots with a median size of ∼500 m 2 and 817,091 individual tree records, with 20.3% of the plots in the boreal biome (vs 21% of forested land globally), 20.4% in temperate biomes (vs 22%), 54% in tropical biomes (vs 50%) and 3% in Mediterranean woodland, tundra, xeric shrubland or mangrove biomes (vs 7% globally).
Tree leaf-type data.Information on leaf habit (evergreen vs deciduous) and leaf type (broadleaved vs needle-leaved) was extracted from the TRY plant trait database 39 .For each species and genus, we assigned the most common leaf habit across all TRY observations, treating leaf habit as a binary variable of whether it is evergreen or not.Species names were standardized using the Taxonstand R package 66 .We then assigned leaf-type information to the individuals included in the GFBi dataset using species-level information or genus-level information in case species-level information was not available.Plots in which <50% of all individuals had a leaf-type record in TRY were excluded.Out of 10,274 species recorded globally in the GFBi dataset, leaf-type records could be assigned to 8,642 tropical tree species, 453 temperate tree species, 46 boreal tree species and 1,124 tree species in dry areas.Next, we calculated the proportion of each leaf-type combination (evergreen broadleaved, deciduous broadleaved, needle-leaved evergreen or needle-leaved deciduous) within a plot, either by dividing the sum of individuals featuring the respective leaf type by the total number of individuals within the plot (individual-based leaf type) or by dividing the basal area featuring the respective leaf type by the total basal area of all trees in the plot (area-based leaf type).
Environmental covariates.In total, 58 global environmental layers reflecting climate 50 , topography, vegetation, anthropogenic and soil 67 (at 0 cm to 100 cm depth) characteristics were used as covariates in our analyses (Supplementary Fig. 1).Climate variables reflect the average climate between 1979-2013.All layers were standardized to 30-arc-second resolution (∼1 km 2 at the equator).

Geospatial modelling
Random forest modelling.To predict the occurrence probabilities of the four forest leaf types (broadleaved evergreen, broadleaved deciduous, needle-leaved evergreen and needle-leaved deciduous), we ran random forest models in Google Earth Engine 68 .In random forest, unlike traditional regression, correlation among variables does not affect model accuracy.Indeed, the ability to use many correlated predictors is one of the key benefits of machine learning models 69 .When variables are correlated, the effect of these variables is 'shared' across the trees in the random forest.Because random forest does not estimate coefficients as in regression, this correlation does not hinder model fit or performance, but rather complicates efforts to quantify variable importance, which is also shared across correlated variables.Thus, by including numerous variables, even if correlated, we can improve our predictive power of the model to accurately quantify current carbon.
To run the models, we set the output mode to 'MULTIPROBABIL-ITY' and randomly sampled 10 individuals from each plot 100 times, weighting individuals by their basal area to model area-based leaf types.The following equation was used to transform DBH to basal area: . To model individual-based leaf types, individuals were sampled without weighting them.This resulted in 100 training datasets, each containing 98,330 rows.After extracting pixel values from 58 environmental covariates, we then modelled leaf types for each training dataset with a random forest model, using the 58 environmental covariates.The correlation between our point-level response variable (leaf type) and spatially contiguous covariates then allowed us to map the global distribution of leaf types.
To train global models of forest leaf types, we first ran a grid-search procedure, exploring the results of a suite of random forest models with different hyperparameters.The hyperparameters that were allowed to vary were the number of random trees (10, 20, 50, 100 and 250), the number of variables sampled at each split (1, 2, 4, 5, 8, 10, 15,  20 and 30) and the minimum sample size at the end of the nodes (1, 2, 5, 10, 15, 20 and 30); subsampling rate was constant at 0.632.To quantify predictive accuracy, we used the Bhattacharyya coefficient to compare the predicted and observed probabilities for each pixel, as is commonly used in image processing and feature extraction [70][71][72][73] .Because four probability classes within a pixel are not independent, we cannot use standard loss functions to estimate predictive accuracy.Instead, we used the Bhattacharyya coefficient, given by ∑ n i=0 √O i × P i , which quantifies the similarity between two vectors, O and P, with n categories.The Bhattacharyya coefficient falls within (0,1), equalling one only when P i = O i for all i within a pixel (that is, when the Letter https://doi.org/10.1038/s41477-023-01543-5predictions exactly match the observed) and zero when the vectors are completely disjoint.To evaluate overall model performance, we then adopted a similar approach to that of ref. 74 for multinomial data 75 , using the Bhattacharyya coefficients to calculate a pseudo-R 2 (R 2 BC ) (equations 1∼5) on the basis of 10-fold cross-validation: in which and where, MAE is the mean absolute error, O i is the observed coverage of leaf type i, P i is the predicted coverage of leaf type i (on out-of-fit data via cross-validation), Ō i is the average coverage of leaf type i across all the observations and n is the number of forest leaf types (here, n = 4).Note that the summation terms in equations ( 3) and ( 5) are the Bhattacharyya coefficients between the observed multinomial distribution and the predicted distribution (equation 3) and the average distribution (equation 5).Thus, equation ( 3) is the predictive loss term for each pixel, with the MAE model in equation ( 2) giving the average across all pixels, which equals zero only when the predictions perfectly match the observations.Similarly, equation ( 5) is the loss term for each pixel when using group-level means, with MAE mean in equation (4) giving the average loss across all pixels.By comparing MAE model to MAE mean , we follow the standard approach for computing R 2 by quantifying performances relative to human predictions, with R 2 = 1 − MAE model /MAE mean equalling 1 only when our predictions are perfect (MAE model = 0) and R 2 being ≤0 when our predictions are equal to or worse than the mean.Importantly, as suggested in ref. 74, equation ( 1) is estimated using out-of-fit cross-validated data, where the predicted values are estimated by omitting the corresponding observed values from the training data, with the resulting pseudo-R 2 used to assess our four-element model output.
To create the final maps (Fig. 2 and Supplementary Fig. 5), we used the random forest model for each training dataset with the optimal suite of hyperparameters based on the R 2 BC from the grid search.Extrapolation of our predictions across global forest areas resulted in 100 four-band global layers, with each band representing the global probability of one forest leaf type.We averaged the predictions from these 100 model outputs by taking the mean for the final map.We calculated the 95% confidence intervals across the 100 model layers 76 to represent sampling uncertainty.
As an alternative mapping approach, we used an independent tree-based classification and regression trees (CART) model 77 (Supplementary Fig. 15).This approach was used to test whether model performance depends on model type.If the two models (random forest and CART) show similar results, this indicates that predictions are not biased by model selection.Using the same independent 'Tallo' dataset 40 used for testing the robustness of the random forest model, the CART model had an explanatory power of 0.46, which is similar to the R 2 BC of 0.47 of the random forest model.When directly comparing both models, the CART model showed 87% similarity (R 2 BC ) with the random forest model.This suggests that our maps and predictions do not depend on the type of model, and we report the random forest model results throughout the main text since it showed slightly higher accuracy 55 .
Cross-validation using external data.We tested for the performance and correlation between the predictions of the area-based and individual-based random forest models.When using the same independent 'Tallo' dataset 40 for testing the robustness of the random forest models, the tree-based and area-based models had an explanatory power of 46% and 47%, respectively.When directly comparing both models, the area-based model showed 89% similarity (R 2 BC ) with the individual-based model, showing that both metrics result in similar predictions of leaf-type variation.
To further evaluate the performance of our models, we additionally compared the model predictions with satellite-derived leaf type estimates from annual land cover maps from the ESA CCI LC 41 .We used land cover layers for the years 2000, 2005, 2010 and 2015, each with a spatial resolution of 300 m × 300 m.To assign each pixel to forest leaf-type classes, we recalculated the leaf-type proportions for each layer as these represented leaf-type proportions across all ecosystem types, including grasslands.For example, we recalculated forest leaf-type proportions for pixels with 30% broadleaf deciduous forest cover, 20% needle-leaf evergreen forest cover, 10% needle-leaf deciduous forest cover and 40% non-forest cover by dividing each forest cover percentage by the total area covered by forest, resulting in 50% broadleaved deciduous, 33.3% needle-leaved evergreen and 16.7% needle-leaved deciduous.We then calculated the average values across the four years for each pixel and compared the results with our model outputs.Our area-based models explained 61% of the spatial variation in the ESA CCI LC models, with an explanatory power of 78% for broadleaved evergreen leaf-type proportions, 31% for broadleaved deciduous, 64% for needle-leaved evergreen and 19% for needle-leaved deciduous.
To generate global layers of soil features, the Soil Grids dataset relies on machine learning models informed by global, spatially explicit information on various climate variables.This global interpolation of soil information using climate data may thus lead to an overestimation of the covariation between climate and soil layers while reducing small-scale heterogeneity in soil features.To assess whether this potential caveat affects our results, we used point-level soil measurements from the WOSIS dataset, including clay content, silt content, pH and sand content.To spatially match this dataset with the full GFBi dataset containing more than 1.1 million plots across the globe 55 , we used the 'geosphere' 78 R package.We then selected the nearest soil observation that fell within 250 m or 1,000 m of the centre of each forest plot.This resulted in a spatial match between soil measurements and forest plots in 146 cases for the 250 m radius and in 1,893 cases for the 1,000 m radius (Supplementary Fig. 14a).To test whether model performance and predictions change when using point observations of soil features instead of global layers, we then trained random forest models using either WOSIS or Soil Grids soil data along with information on climate, topography, human and vegetation characteristics (Supplementary Fig. 14b-d).For both the 250 m and 1,000 m buffer radii, the models showed a high degree of agreement ( R 2 BC = 0.99 ) between model predictions.
In a second step, we then tested whether the use of point observations vs global layers of soil features affects the estimated importance of variables in driving leaf-type variation.The analysis showed a slight reduction in the importance of soil variables of 5-6% when using point observations rather than Soil Grids data (Supplementary Fig. 13), which is probably driven by the slightly lower covariation of soil point data with climate variables (Supplementary Fig. 16).Nevertheless, the results remain similar, showing that this difference is unlikely to affect the conclusions of our study.

Letter
https://doi.org/10.1038/s41477-023-01543-5 Interpolation vs extrapolation in model predictions.To evaluate how well our training dataset represents the full multivariate environmental covariate space, we performed a principal-component-analysis-based approach following refs.76,79.We projected the covariate composite into the same space using the centreing values, scaling values and eigenvectors from the principal component analysis of the training data.We created the convex hulls for each of the bivariate combinations from the top principal components and classified whether each pixel falls in or outside each of these convex hulls.We used 24 principal components with 276 combinations for all covariates for the sampling dataset.This analysis showed that 99.2% of land pixels (778,975,911 of 785,150,461) cover at least 90% of the environmental variables present in our training data locations (Supplementary Fig. 17).

LOO-CV.
To account for the potential effect of spatial autocorrelation in model residuals on model validation statistics, we ran spatially buffered LOO-CV for a series of buffer radii from 10 m to 500 km.In LOO-CV, each observation is predicted on the basis of a model that includes all data outside the respective buffer radius.This results in 9,781 (=total number of observations) separate models for each buffer radius.Model performance was evaluated on the basis of R 2 BC .To assess the range of spatial autocorrelation, we ran semi-variograms for random cross-validation and LOO-CV model residuals in each forest type, showing that regardless of buffer radius or validation type, our residuals show weak spatial autocorrelation (Supplementary Fig. 1).Nevertheless, when eliminating any potential effects of spatial autocorrelation on model performance by applying large buffer radii of 300 km and 500 km, the out-of-sample R 2 BC remained high (0.56 and 0.53, respectively).

Global tree density and biomass calculation of leaf types
Tree leaf-type densities were estimated by integrating a map of the global tree density distribution 47 with our individual-based forest leaf-type maps.For each pixel, we multiplied tree density values with modelled forest leaf-type proportions to obtain the pixel-level stem density of each leaf type.We then summed up these pixel-level abundances across the globe to estimate the total abundances of each forest leaf type.To obtain the total number of trees of each leaf type for the major forest types, we defined tropical forests as pixels falling in the biomes tropical and subtropical moist broadleaf forest (WWF 80 biome 1), tropical and subtropical coniferous forest (biome 3) and mangroves (biome 14).Temperate forests were defined as pixels in the biomes temperate broadleaf and mixed forest (biome 4) and temperate coniferous forest (biome 5).Boreal forests were defined as pixels in the biomes boreal forest or taiga (biome 6), montane grasslands and shrublands (biome 10) and tundra (biome 11).Dry forests were defined as pixels in the biomes tropical and subtropical dry broadleaf forest (biome 2), tropical and subtropical grasslands, savannas and shrubland (biome 7), temperate grasslands, savannas and shrubland (biome 8), Mediterranean forests, woodlands and scrub (biome 12), and deserts and xeric shrubland (biome 13).
Forest biomass for each leaf type was computed by incorporating a map of global forest biomass 49 with our area-based leaf-type models.We calculated the absolute biomass by scaling biomass density with tree canopy cover 81 and pixel area within each pixel.This absolute biomass per pixel was then partitioned by leaf-type proportions, derived from our area-based models.By summing up the pixel-level biomass across the globe, we were able to approximate the total amount of biomass stored in each of the leaf types.

Environmental drivers of forest leaf-type variations
To evaluate the relative importance of environmental features on forest leaf-type variation, out of the total 58 environmental covariates, we tested the effects of 29 commonly used environmental characteristics using random forest models (see Supplementary Fig. 1).We separated the environmental features into four major groups: climate, soil, topography and vegetation.To equally represent each of the three groups in the model and minimize collinearity, we selected the first six principal components from climate, soil and topography groups.These six principal components explained ≥90% of the total variation across all group variables.We included all the three vegetation characteristics, which are forest age, tree density and canopy height, into the analysis without computing the principal components.We then computed the variance inflation factors (VIFs) across all 21 principal components using the R package HH 82 .All VIFs were lower than 4, suggesting sufficient independence among covariates.The principal components were then used as predictors in random forest models with different combinations of hyperparameters (that is, 1 to 12 samples per split and a minimum sample size at the end of the nodes of 1 to 10), and we selected the ten best models with the highest coefficient of determining variation (R 2 ).Variable importance was determined by calculating the relative influence of each variable, expressed by the variable's attributed reduction in squared error.To quantify the importance of individual environmental factors, we used the same combinations of hyperparameters.Based on R 2 , the ten best random forest models were again chosen to explore the relative importance of each factor.The random forest models were run via the R package h2o 83 .

Forest types and their future climate envelopes
To assess the extent and distribution of future changes in forest leaf-type climate envelopes, we projected our leaf-type models into the future using CMIP6 climate models for the time interval of 2071-2100 and three emission scenarios (SSP1-RCP2.6,SSP3-RCP7 and SSP5-RCP8.5) 50.The CMIP6 climate models were extracted following the ISIMIP3b protocol and included five models (gfdl-esm4, ukesm1-0-II, mpi-esm1-2-hr, ipsl-cm6a-lr and mri-esm2-0).We projected the global forest leaf-type distribution for each emission scenario on the basis of each of the five climate models using our random forest models.To do so, we used our 100 random forest models of present leaf-type distributions and replaced the climate variables ('bioclim' variables from the CHELSA dataset, marked with hashtags in Supplementary Fig. 1) with CMIP6 climate model projections while keeping soil, topographic, vegetative and anthropogenic characteristics constant.For each emission scenario and CMIP6 model, we aggregated the 100 layers by taking the mean.We then aggregated the five CMIP6 model projections by taking the mean.For both evergreen vs deciduous and broadleaf vs needle-leaf proportions, we then subtracted the present predictions by the averaged model projections for the three climate change scenarios.We then summarized the predictions for each of the three climate change scenarios across latitude, aggregating predictions for each half degree.This allowed us to identify areas where future climates will shift to conditions that currently support leaf types different from the prevailing ones (Fig. 5 and Supplementary Fig. 9).To do so, we first obtained information on the forest type that currently is most abundant in each pixel (using area-based leaf-type proportions).Pixels in which > 60% of the forest area was covered by a single leaf type were assigned to that respective leaf type.Pixels in which none of the leaf types covered > 60% of the forest area were categorized as mixed forest (Fig. 5).The analysis was also conducted for a forest-area threshold of 80% to ensure that the results are not driven by the choice of the threshold (Supplementary Fig. 9).Following these definitions, we categorized forest pixels into groups using present and future climate scenarios.By scaling pixels by canopy cover, we could then calculate the total areas in which the climate is expected to shift to conditions that currently support a different forest type.

Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Fig. 1 |
Fig. 1 | Global coverage of forest inventory locations (GFBi data) and plot-level leaf-type proportions.a, A total of 9,781 forest inventory plots (green points) were used for geospatial modelling of forest leaf types.b, Number of plots in relation to their proportion of evergreen vs deciduous and broadleaved vs needleleaved individuals.

Fig. 2 |
Fig. 2 | The global distribution of forest leaf types.a, The global distribution of tree leaf type as predicted by a random forest model built from area-based leaftype data (see Methods).Pixels are coloured in the red, green and blue spectrum according to the percentage of total tree basal area occupied by broadleaved evergreen, broadleaved deciduous and needle-leaved tree types, as indicated by the ternary plot.Needle-leaved evergreen and needle-leaved deciduous forests

Fig. 3 |
Fig. 3 | Variable importance of environmental covariates on forest leaf-type variation.a,b, Cumulative importance of the first six principal components of climate, soil, topographic and vegetation covariates in the variation of leaf habit (a) and leaf form (b). c,d, Variable importance of selected environmental features on variation in leaf habit (c) and leaf form (d). Bars in c and d represent the mean ± 95% CI; relative importance based on the 10 best random forest models (n = 10; see Methods).Area-based leaf-type proportions were used to represent forest (plot-level) leaf-type variation.

Fig. 4 |
Fig. 4 | The global proportion of evergreen broadleaved, deciduous broadleaved, needle-leaved evergreen and needle-leaved deciduous trees.The relative proportions of trees that occur within tropical, temperate, boreal and arid regions are shown as separate pie charts for each leaf type.

Fig. 5 |
Fig. 5 | Forested areas where future climates may no longer support prevailing leaf types.If a pixel's forest area was predominantly (>60%) covered by one leaf type, it was classified as that specific leaf type.Pixels where no leaf type exceeded 60% coverage were classified as mixed forest.To determine the relative proportion of each leaf type per plot, we considered the basal area of individual trees (area-based leaf type).Coloured pixels on the map indicate areas that, by the end of the century (2071-2100), will face climate conditions that currently support a different forest type.The future climate conditions were represented using three climate change scenarios: low-emission (SSP1-RCP2.6;a,b), business-as-usual (SSP3-RCP7; c,d) and high-emission (SSP5-RCP8.5;e,f) for the period 2071-2100.Panels a, c and e show the present forest types.In contrast, panels b, d and f show the type of forest expected under the projected future climate of each respective pixel.