Global variation in the fraction of leaf nitrogen allocated to photosynthesis

Plants invest a considerable amount of leaf nitrogen in the photosynthetic enzyme ribulose-1,5-bisphosphate carboxylase-oxygenase (RuBisCO), forming a strong coupling of nitrogen and photosynthetic capacity. Variability in the nitrogen-photosynthesis relationship indicates different nitrogen use strategies of plants (i.e., the fraction nitrogen allocated to RuBisCO; fLNR), however, the reason for this remains unclear as widely different nitrogen use strategies are adopted in photosynthesis models. Here, we use a comprehensive database of in situ observations, a remote sensing product of leaf chlorophyll and ancillary climate and soil data, to examine the global distribution in fLNR using a random forest model. We find global fLNR is 18.2 ± 6.2%, with its variation largely driven by negative dependence on leaf mass per area and positive dependence on leaf phosphorus. Some climate and soil factors (i.e., light, atmospheric dryness, soil pH, and sand) have considerable positive influences on fLNR regionally. This study provides insight into the nitrogen-photosynthesis relationship of plants globally and an improved understanding of the global distribution of photosynthetic potential.

nitrogen more efficiently? p. 16, line 302-304. The link of LNC and pH only holds within a certain range after which the soil gets too acid. p. 17, line 332-333. The authors only look at the internal uncertainty of the LNC map, but not how this map compares to other more recent maps (see references above). p. 18, line 347-248. While the optimal VCmax models performed well, EM5 also performed well (very similar to the optimal models). This should be discussed. The received manuscript entitled "Global variation in the fraction of leaf nitrogen allocated to photosynthesis" aims to use a comprehensive database of in-situ observations and a novel remote sensing product of leaf chlorophyll, in combination with climate and soil characteristics, to examine the global distribution of fLNR using a random forest model (RF). Their estimates have been compared with some other parametric approaches. Moreover, the authors carried out different analyses to provide insights into the nitrogen-photosynthesis relationship of plants globally and an improved understanding of the global distribution of photosynthetic potential.
Although I appreciate the effort of the authors and I think it is a very interesting topic. I really see important flaws in the manuscript that need to be clearly addressed before publication. Especially because most of the insights that the authors are providing rely on the validity of the provided maps which, in my opinion, need further validation work.
1.Most of the traits maps provided in the literature used a much higher number of observations. Probably that is one of the reasons why no explicit maps of Vcmax have been provided so far. Because of this a much more detailed analysis of the results is needed. *Ploton, P., Mortier, F., Réjou-Méchain, M., Barbier, N., Picard, N., Rossi, V., ... & Pélissier, R. (2020). Spatial validation reveals poor predictive performance of large-scale ecological mapping models. Nature communications, 11(1), 1-11. 3.This is also linked with my first concern. I think that your RF model could be overfitted. Have you tested if the convex hull of your input data is really representative of the whole planet where you are predicting?. Some box plots of your training data against the box plots of the rest of the planet should show that easily.
4.Have you compared the predicted maps with other sophisticated regression approaches (ANN, GPs, SVR...) to check if your results depend too much on the selected method (RF)? What about the accuracies, do you think that other methods could improve them?.
5.I also think that your uncertainty maps could be too optimistic. Unless you are completely sure that your model is not extrapolating, the variance of the RF model could be not reliable.
6.There are a variety of trait maps that could be also used in your work (including the Butler et al. ones). In fact, they present significant differences among them that could affect your results a lot. A comparison of the effect of them could be also very useful.

R1C1:
This study presents a very thorough, interesting and relevant analysis of global variation in fLNR and Vcmax. The results are highly relevant for global ecologists and modellers and are based on a very robust analysis. Authors: We appreciate the positive comments from the reviewer. We are excited about the perspective of providing a useful data-driven map of fLNR and Vcmax to improve the modelling of global vegetation dynamics.
R1C2: I have no major critics on the study, as it is very well written, clearly presented and the analysis seems very robust. However, I am missing one key thing in the paper. I am convinced that the global maps (especially the Vcmax map) will be widely used and cited by the global vegetation modelling community. I therefore recommend the authors to add a few sentences of "guidance" for modellers when using this map and it would be good to provide a "Vcmax ucertainty" map as well. Currently only fLNR uncertainty maps are provided. A map representing the uncertainty on the Vcmax map would be very helpful for modellers that want to extract a Vcmax value for a specifc region our of the map… Authors: Thank you for the suggestion. Indeed, it was a missed opportunity to provide a guidance on the map to facilitate the community use. We have added a paragraph in the discussion and a new figure (Fig. S8) to do so (L424 -L431).
"In addition, the data-driven map provides a direct constraint on the spatial variations of vegetation photosynthetic capacity. We release the map and its associated uncertainty (i.e., one standard deviation of estimates from bagged trees in the RF) to facilitate large scale ecological and modelling studies (Fig. S8). Since the remote sensing retrieval of leaf chlorophyll content is for top leaves (Croft et al., 2020), and the phenological stages of trait samples are not often available, the estimated in our study can be interpreted as a multi-year average for top canopy leaves. The seasonality and within-canopy variations of are not accounted for in the map."

R1C3:
The very high values for fLNR and Vcmax that you find in the Sahel, compare very well to recent observations in the Sahel by Sibret et al. 2021. I think this is a great illustration of the validity of the map, worth to mention in the discussion. For example, when referring to arid environments (line 282) and the relation with leaf nitrogen. Related to that I found it a pity that no Savanna-type PFT was included in the PFT classification. Authors: Thank for directing us to this new paper. We have added Sibret et al. 2021 in our discussion to support the high Vcmax25 and fLNR we found in the Sahel (L284-290).
"Several studies have reported that plants in arid environments (i.e., high VPD) tend to have a higher Amax and LNC (Sibret et al., 2021;Wright et al., 2003)…" "Our results show that other than Amax and LNC, fLNR also increases with VPD, consistent with a recent study reporting higher nutrient use efficiency for plants in semi-arid ecosystems of the African Sahel (Sibret et al., 2021)." In this study, we used the land cover map produced by the European Space Agency (ESA) CCI project. The ESA CCI land cover map does not have a savanna type PFT (see details here: http://maps.elie.ucl.ac.be/CCI/viewer/download/ESACCI-LC-Ph2-PUGv2_2.0.pdf). We used the ESA CCI map to parameterize the algorithm that derives leaf chlorophyll content from remote sensing (Croft et al., 2020). As leaf chlorophyll is the most important predictor in our study (Fig. S5a), it is reasonable to use the same land cover map in this data-driven study. One study comparing MODIS and ESA CCI land cover suggested that "the Sahel and Savannah region…are characterized by approximately equal probabilities for multiple LC classes such as shrubland, grassland and bare/sparse vegetation" (Tsendbazar et al., 2017), therefore the fLNR and Vcmax of savannah should be similar to the values of grassland and shrubland.

R1C4:
The authors present the 7 models as "7 competing hypotheses" (e.g. on line 240). I think this is not correct because M1& M2 and M3&M4 are actually based on the same hypothesis. So I think it is more correct to talk about 7 models representing 4 (or 5) hypotheses. Authors: Thanks for pointing this out. We have revised the sentence to say "seven models based on 5 competing hypotheses".
For the rest I only have a few minor comments and suggestions: R1C5: Line 114: the authors introduce leaf chlorophyll content as one of the 20 'environmental factors'. I would suggest to change the wording of that sentence, because the reader will expect mainly 'external' (abiotic) factors when you introduce them as such. Authors: Motivated by your suggestion, we have revised the sentence to "We examined the relative importance of 20 environmental (i.e., biotic and abiotic) factors in estimating in-situ observations of …" to highlight the inclusion of both biotic and abiotic factors in our analysis.
We then listed the factors in the sequence of their relative importance identified by the prediction of variable importance by permutation analysis.

R1C9:
Line 312: when you mention equation 1 here, it is good to refer to the methods section Authors: This has been corrected as suggested. We added "(see Methods)" after "Equation 1" R1C10: Line 313: alfa25 is mentioned for the first time here, but is only introduced properly on line 315. Authors: Thanks. We have updated the paragraph to define α 25 and fNR upon their first appearances.

R2C1:
Overall the paper looks at the fraction of leaf nitrogen that is allocated to rubisco globally. The authors use a novel approach by combining several datasets, remote sensing products and models. Given the interest in recent years in remote sensing of leaf nitrogen, this is a timely paper. Especially since it has been debated how much canopy nitrogen actually contributes to photosynthesis. While I very much appreciate the paper, and think it is a valuable contribution to the literature, there are some questions to the authors that need to be addressed. Authors: We appreciate the constructive comments from the reviewer, and have taken them on board to improve the manuscript.
Specific comments: R2C2: p. 3, line 67: The authors mention that empirical 'nutrient based' VCmax models use PFT specific linear equations to calculate VCmax from LNC, but how do they get LNC? From data? From RS, or an internal model result? This matters quite a bit, and data is not very much available. Authors: We apologize for the confusion. We used a published global leaf nitrogen content (LNC; g/m2) map -EB17 (Butler et al., 2017) in our analysis, which is upscaled from in-situ observations using plant functional maps, climate and soil variables. The method for upscaling and validation of the LNC estimate are provided in EB17. We mentioned the use of this published data in L106 in the Introduction and added a new paragraph in the Methods to introduce the LNC map (L457 -470, please see our response to R1C5). The EB17 LNC data is publicly accessible at https://github.com/abhirupdatta/global_maps_of_plant_traits (accessed on Jun 2nd, 2021). We added this link in our data availability section.
Please kindly refer to our response to R1C5 where we describe the selection and use of LNC in detail.

R2C3
: p. 5, line 100: the references of 'remote sensing of leaf traits' are not the newest references, this field has been very active in the last years. Authors: We now cite more recent papers here per your suggestion (Asner et al., 2015;Croft et al., 2020;Moreno-Martínez et al., 2018;Serbin et al., 2015), and would be happy to add more if we missed any.  These maps differ, and how does this influence the results of the paper? Also, the Butler map is given in N concentration, while the authors use N area based. How did the authors recalculate? Using SLA? And if this is the case, does this influence the relationship with LMA the authors find later in the paper? Authors: We apologize for the confusion. We have added a new section (L457-470) in the Methods to describe in details how we got LNC from EB17 (Butler et al., 2017). Additionally, motivated by the comment from the reviewer, we have also included two alternative leaf nitrogen maps from AMM18 (Moreno-Martínez et al., 2018) and CB20 (Boonman et al., 2020) in our study.
"In our study, the LNC data we used to derive fLNR was acquired from EB17 (Butler et al., 2017). EB17 provides data-driven estimates of global mass-based leaf nitrogen content (LNCm; unit: mg/g), specific leaf area (SLA; unit: m 2 /kg) and their associated uncertainties. We estimated LNC using the equation LNC = LNCm/SLA (unit: g/m 2 ). The uncertainty of LNC was propagated from the uncertainties of LNCm and SLA generated from 1000 bootstrapping tests. We note there are two alternative global LNC maps available: AMM18 (Moreno-Martínez et al., 2018) and CB20 (Boonman et al., 2020). Comparing the three LNC maps (Fig. S9), we found the spatial patterns of EB17 and AMM18 are similar to each other, with EB17 reported a relatively larger spatial gradient of LNC. CB20 LNC shows less evident spatial variation than EB17 and AMM18, and it has the lowest R 2 in validation among three products (Table S3). Since EB17 has the highest R 2 in validation among the three, and the weighting strategy of EB17 (i.e., unweighted by species abundance within a pixel) is close to the weighting strategy we adopted for estimation, we chose EB17 LNC map as the principal LNC in our analysis, but have also examined the impacts of using alternative LNC maps on our results (Fig. S10)."   (Boonman et al., 2020). Each product has been validated in their respective studies (Table S3). To examine the uncertainty incurred by the choice of LNC maps, we calculated fLNR using each of the three LNC maps. The three resulting fLNR maps show similar spatial patterns (Fig. S10), with the spatial correlation coefficients (r) between them ranging from 0.57 to 0.71 (p < 0.01). Examining the sensitivities of fLNR to environmental variables, we found the fLNR based on EB17 and AMM18 demonstrated similar sensitivities -fLNR is most sensitive to LMA, LPC, VPD, PAR and soil pH. Meanwhile, the fLNR based on CB20 is most sensitive to soil pH, LMA, VPD, air temperature, and soil sand percentage. Noting that CB20 LNC map has lower R 2 in its cross-validation compared to that of EB17 and AMM18 (Table S3), we have more confidence in the fLNR maps based on EB17 and AMM18. In our study, we used EB17 as the principal LNC map as it demonstrated the highest R 2 in validation (Table S3), and the weighting strategy of EB17 (i.e., unweighted by species abundance within a pixel) is close to the weighting strategy we adopted for ." Figure S10. Three global fLNR maps estimated using alternative leaf nitrogen content maps (Boonman et al., 2020;Butler et al., 2017;Moreno-Martínez et al., 2018). (a,b,c) the spatial variation of fLNR, (d,e,f) the spatial correlations between the fLNR maps and (g,h,i) the sensitivities of fLNR to environmental factors.
The calculation of fLNR from LNC (which is LNCm/SLA, or equal to LNCm*LMA) has limited influence on the sensitivity of fLNR to LMA. The reason is that for fLNR sensitivity analysis, we examined the additive effect of LMA on fLNR, while for the calculation of fLNR, LMA is essentially in the denominator of the equation. Therefore, most of the spatial variation in fLNR originates from the variations of the numerator -the (which was derived from remotely sensed leaf chlorophyll content, plant functional types, precipitation and soil pH not LMA). This is also seen in the stronger spatial correlations between fLNR maps (Fig. S10d,e,f) than the spatial correlations between the LNC maps (Fig. S9d,e,f).

R2C6:
In page 7, line 143-145 the authors do evaluate the Vcmax models used, but I miss such an evaluation for the LNC input maps. Authors: We used published global LNC maps, which have examined their products in their respective studies. We have included their validation results in Table S3 (please see our response to R2C5).

R2C7
: p. 7, line 138-139. Interesting to see that the fLNR in boreal and tropical zones is lower than in temperate zones, very unexpected given the N limitation in boreal and tropical zones, and the high N deposition in temperate zones. Can the authors explain this further? Authors: That is an interesting point and we are glad to discuss more.
In this study, we explore the spatial sensitivities of fLNR to various biotic and abiotic factors. First, we did not see a strong response of fLNR to changes in soil N over space (Fig. 3b). While we acknowledge that some studies have suggested soil N addition/deposition influenced photosynthesis, the significance of the N deposition on photosynthesis is dependent on biome types, N deposition load (Fleischer et al., 2013), and time after the deposition (Liang et al., 2020). As we focus on the spatial variation of fLNR in this study, the localized and timedependent changes can be minimized in the average global pattern we see.
Second, it has been recently suggested that leaf photosynthetic capacity is decoupled from soil N supply, as N supply is generally enough to support leaf-level photosynthesis (Peng et al., 2021). The changes in leaf photosynthesis are likely incurred by increasing CO 2 or acclimation to local climate. Please also note that the discussed influences of N deposition are mostly for LNC and photosynthetic capacities, we are not sure there are studies showing that N deposition can influence fLNR.
Third, it is very likely the influence of N deposition on soil N, if any, has already been included in the soil N map we used. Soilgrids use ground measured soil profiles from WoSIS database for its global upscaling. Among the soil profiles from WoSIS, at least 47.7% (maximum 87.7%) samples are collected between 1980 and 2019 (Batjes et al., 2020). We would expect the strong N deposition during this period (Zhang et al., 2017) has been implicitly incorporated into the soil N map, and our study found no evident soil N impacts on fLNR (Fig. 3).
We have added the discussion above on nitrogen deposition (L316-325) in the manuscript.
At last, while we are unsure that fLNR should be indicative of soil N availability, our study suggests that the lower fLNR in boreal and tropical ecosystems are due to the coordination of leaf traits along the leaf economic spectrum (Fig. 3b-c). With heavier investment of nutrients in the structural build-up of evergreen leaves, less nitrogen is available for photosynthesis (i.e., lower fLNR) in some tropical and boreal regions.
Authors: Please see our response to R2C7. R2C9: P. 10, line 185-187. EM5 captured the response of fLNR to leaf traits very well, but is not mentioned? Authors: We have modified the statement to "the optimal models (i.e., EO and LUNA) and EM5 captured the response of fLNR to leaf traits PC1 well…" following the reviewer's suggestion. The numbers we reported here indicate the overall quantitative effect of each type of factor (i.e., leaf traits, climate and soil) on the spatial variation of fLNR. It helps us understand, in a relative sense, which process governs the spatial variation of fLNR.

R2C10
In the section following the statement, we disassembled these numbers into the sensitivities of fLNR to individual biotic and abiotic factors. For example, we disassembled the total effect from "climate" on fLNR into the effects of "temperature", "precipitation", ''PAR", "VPD" and "SWC" on fLNR (Fig. 3b).
R2C13: p. 12, line 224. Where can I find these tropical forests? (see also p. 14, line 272-273) Authors: We examined the spatial sensitivities of fLNR to environment variables. Therefore, the places that have lower leaf phosphorus content (LPC) are likely to have a stronger LPC limitation effect on fLNR (Fig. R2), e.g., western Congo basin forests and central Amazon forests. Figure R2. The leaf phosphorus content (LPC) of evergreen broadleaf forests in the tropics. Note that the range of tropical LPC (0 to 0.12 g/m 2 ) is much lower than the global range (0 to 0.40 g/m 2 ). "In addition, we note that the productivity of some grasslands (Dong et al., 2019;Fay et al., 2015) and boreal forests (Braun et al., 2010;Giesler et al., 2002) has also been reported to be limited by phosphorus availability, however, we did not detect strong positive dependence of fLNR on LPC globally for these ecosystems in our study. The difference potentially suggests that the phosphorus limitation of grasslands and boreal forests is not as prevalent as that for tropical forests and mixed forests (though some mixed forests are in the boreal region)". R2C15: p. 15, line 285. Could there also be a link with fire? More fire > loss of N > stimulate plants to use nitrogen more efficiently? Authors: We feel the fire -N availability-plants link is outside of the scope of our study and prefer to refrain from speculative discussion here (e.g., low intensity fire can increase nitrogen availability in some studies (Schoch and Binkley, 1986)). Other than the reasons we provided in the response to R2C7 that soil N has no evident impacts on fLNR, we are unsure whether fire, often sporadic in nature can shape soil N supply at the global scale.

R2C14
R2C16: p. 16, line 302-304. The link of LNC and pH only holds within a certain range after which the soil gets too acid. Authors: Thank you for this point. We only examined the impact of pH on fLNR, not LNC, in our study (Figure 3). We would be happy to check out the paper if this is a study we missed.

R2C17:
p. 17, line 332-333. The authors only look at the internal uncertainty of the LNC map, but not how this map compares to other more recent maps (see references above). Authors: Thank you. We fully agree and have included two more LNC datasets (Boonman et al., 2020;Moreno-Martínez et al., 2018) in our analysis. Please see our response to R2C5.
R2C18: p. 18, line 347-248. While the optimal VCmax models performed well, EM5 also performed well (very similar to the optimal models). This should be discussed. Authors: We agree that EM5 performed well in showing a large spatial variation of Vcmax25 and capturing the influence of other leaf traits on fLNR. We have added this statement in our results (please see our response to R2C9). However, EM5 did not well capture the impacts of climate and soil (Fig. 2) and thus results in very different spatial patterns of Vcmax25 and fLNR than those from the optimality model and the RF ( Fig. S1 and S2).

R3C1:
The received manuscript entitled "Global variation in the fraction of leaf nitrogen allocated to photosynthesis" aims to use a comprehensive database of in-situ observations and a novel remote sensing product of leaf chlorophyll, in combination with climate and soil characteristics, to examine the global distribution of fLNR using a random forest model (RF). Their estimates have been compared with some other parametric approaches. Moreover, the authors carried out different analyses to provide insights into the nitrogen-photosynthesis relationship of plants globally and an improved understanding of the global distribution of photosynthetic potential. Authors: Thank you for reviewing our paper and for your very useful suggestions.

R3C2:
Although I appreciate the effort of the authors and I think it is a very interesting topic. I really see important flaws in the manuscript that need to be clearly addressed before publication. Especially because most of the insights that the authors are providing rely on the validity of the provided maps which, in my opinion, need further validation work. Authors: We appreciate the constructive comments. We have used your comments to improve our study, including extensive additional validation efforts. Please see details below.
R3C3: 1.Most of the traits maps provided in the literature used a much higher number of observations. Probably that is one of the reasons why no explicit maps of Vcmax have been provided so far. Because of this a much more detailed analysis of the results is needed. *Ploton, P., Mortier, F., Réjou-Méchain, M., Barbier, N., Picard, N., Rossi, V., ... & Pélissier, R.
(2020). Spatial validation reveals poor predictive performance of large-scale ecological mapping models. Nature communications, 11(1), 1-11. Authors: We agree with the reviewer that the production of a data-driven map has been challenging. In recent years, with the deluge of public trait data and the new remote sensing data, we are reaching a point of having enough information to infer global . We are confident in this because 1) we have amassed a considerable amount of observations (n = 8610) from multiple sources and 2) we have a new dataset of global leaf chlorophyll content, which is known to relate to (Croft et al., 2017). We appreciate the sentiment to carry out more detailed examination of the map, however, and have added more validation analyses in this revised version: 1. We added a spatial cross-validation. In addition to the out-of-the-bag (OOB) test we did in the original manuscript, we conducted a spatial cross-validation to validate the Vcmax25 map, following the paper recommended by the reviewer (Ploton et al., 2020). We removed the validation data points within 1.5 degree (~ 150 km) of the training samples to eradicate the influence of spatial autocorrelation on extrapolation.
We added the following statement in the Methods (L522-532) "We trained the RF using the top 4 predictors and used both conventional and spatial crossvalidation to examine the reliability of the RF. We used 80% of samples for training and 20% for validation. The spatial cross-validation is similar to conventional cross-validation, but we removed the validation data points within 1.5 degree (~150 km) of the training samples to avoid spatial autocorrelation (Ploton et al., 2020). The spatial cross-validation showed an R 2 = 0.52 ± 0.36 and an RMSE of 39.9 ± 14.5 μmol m −2 s −1 , while the conventional cross-validation showed an R 2 = 0.52 ± 0.01 and an RMSE of 20.7 ± 0.3 μmol m −2 s −1 (Fig. S11). The accuracy of the conventional cross-validation is comparable to previous trait upscaling studies (Boonman et al., 2020;Butler et al., 2017;Moreno-Martínez et al., 2018;Van Bodegom et al., 2014)"  Authors: Thanks for the question. We fully agree that finding representative values of Vcmax25 of each pixel (community level trait value) is challenging, as we know Vcmax25 within an ecosystem not only changes with species/genus, but also change with seasons and location of the leaves in the canopy. However, this high-level heterogeneity information is often lacking in trait databases.
We have indeed tested using species and genus abundance weighted trait values to train the RF for Vcmax25 estimation, however, several issues arose in the process: 1) the abundance weights for Vcmax25 are not comparable with those of key predictors/covariates (e.g. LNC, LMA and LPC), because they have different number of observations across the globe; 2) using community weighted average values leads to fewer samples for training (from n = 8610 to n = 429 as we aggregated these trait values to the community level) and therefore makes the RF more susceptible to overfitting.
To address these challenges, we used all traits in the random forest (RF) model without specifying their weights. We expect the RF can identify the general gradient in samples by using a large number of samples to help trees in the RF to make the right assignments. We understand such a strategy generates uncertainties as the estimates vary between runs when different chunks of data are sampled, but it also allows us to quantify the uncertainty of Vcmax25 through running a large ensemble of trees. We added a statement to acknowledge this in our method section (L490-495): "We did not apply species/genus abundance weights to trait values for training and validation purposes, because 1) the available species abundance information at the community level for was not the same as that for other leaf traits (i.e., leaf nitrogen content) that are potential predictors; 2) we had much smaller number of samples for training if we aggregated trait values to the community-level (from n = 8610 to n = 429), which led to a greater risk of overfitting the RF model."

R3C5:
3.This is also linked with my first concern. I think that your RF model could be overfitted. Authors: Thanks for mentioning that. Yes, we used ANN in the early stage of our study (Fig. R3). We were concerned that ANN was subject to serious overfitting as its performance was sensitive to the selection of variables and the number of nodes and layers of the neural network. Though the mean values of Vcmax25 from ANN estimates were similar to that of RF ( Fig. R3 vs Fig. S8), the uncertainty of Vcmax25 from ANN was substantially larger due to overfitting. As the study advanced, we decided to use RF for two reasons: 1) RF is good at dealing with categorical variables (James et al., 2013;Moreno-Martínez et al., 2018) and for some key inputs such as plant functional types and koeppen climate zones, we can directly use these variables for training while methods like ANN have to convert them to continuous variables or divide samples by categories; 2) the tree-based RF method is compatible with the prediction of variable importance by permutation, a widely used method to select key variables for machine learning based on their importance (e.g., Terrer et al., 2019). Using a reduced number of variables can help avoid overfitting, especially when the number of candidate predictors in our study is not small (n=20) compared to the number of samples (n = 8610). We added a new section in the manuscript (L485-490) based on the statement above to justify our choice of the RF.
As several recent studies have suggested that RF is a robust method for the extrapolation of leaf traits (Boonman et al., 2020;Moreno-Martínez et al., 2018), we did not intend to examine a whole set of machine learning methods here. One comparison study has suggested that RF performance is as good as general linear models (GLM) and general additive models (GAM) if not better than them (Boonman et al., 2020).

R3C7
: 5.I also think that your uncertainty maps could be too optimistic. Unless you are completely sure that your model is not extrapolating, the variance of the RF model could be not reliable.
Authors: Thank you for the comment. Please see our response to R3C3 on cross-validation and the representativeness of samples, which hopefully can alleviate your concerns on the uncertainty of extrapolation.
In addition to the reasons above, the new remote sensing leaf chlorophyll content (Chl) dataset provides a novel and direct constraint on the spatial variation of leaf photosynthetic capacity (Croft et al., 2017;Luo et al., 2019). Our importance analysis suggested that Chl is the most important predictor in our random forest model (Fig. S5a). The use of this independent and observational Chl map can help us reduce the uncertainty incurred from extrapolation.
R3C8: 6.There are a variety of trait maps that could be also used in your work (including the Butler et al. ones). In fact, they present significant differences among them that could affect your results a lot. A comparison of the effect of them could be also very useful. Authors: Thank you for bringing this up. We fully agree that the uncertainty incurred by the use of different LNC dataset may influence our results. We have therefore added new analysis and a new section in the Discussion to examine whether the choice of leaf nitrogen maps would influence our results (L359-373).
"The choice of LNC map is another source of uncertainty in the derivation of fLNR. There are several global LNC maps available other than the EB17 (Butler et al., 2017) we used, namely AMM18(Moreno-Martínez et al., 2018) and CB20 (Boonman et al., 2020). Each product has been validated in their respective studies (Table S3). To examine the uncertainty incurred by the choice of LNC maps, we calculated fLNR using each of the three LNC maps. The three resultant fLNR maps show similar spatial patterns (Fig. S10), with the spatial correlation coefficients (r) between them range from 0.57 to 0.71 (p < 0.01). Examining the sensitivities of fLNR to environmental variables, we found the fLNR based on EB17 and AMM18 demonstrated similar sensitivities -fLNR is most sensitive to LMA, LPC, VPD, PAR and soil pH. Meanwhile, the fLNR based on CB20 is most sensitive to soil pH, LMA, VPD, air temperature and soil sand percentage. Noting that CB20 LNC map has lower R 2 in its cross-validation compared to that of EB17 and AMM18 (Table S3), we have more confidence in the fLNR maps based on EB17 and AMM18. In our study, we used EB17 as the principal LNC map as it demonstrated the highest R 2 in validation." Figure S10. Three global fLNR maps estimated using alternative leaf nitrogen content maps (Boonman et al., 2020;Butler et al., 2017;Moreno-Martínez et al., 2018). (a,b,c) the spatial variation of fLNR, (d,e,f) the spatial correlations between the fLNR maps and (g,h,i) the sensitivities of fLNR to environmental factors.