A high-resolution canopy height model of the Earth

The worldwide variation in vegetation height is fundamental to the global carbon cycle and central to the functioning of ecosystems and their biodiversity. Geospatially explicit and, ideally, highly resolved information is required to manage terrestrial ecosystems, mitigate climate change and prevent biodiversity loss. Here we present a comprehensive global canopy height map at 10 m ground sampling distance for the year 2020. We have developed a probabilistic deep learning model that fuses sparse height data from the Global Ecosystem Dynamics Investigation (GEDI) space-borne LiDAR mission with dense optical satellite images from Sentinel-2. This model retrieves canopy-top height from Sentinel-2 images anywhere on Earth and quantifies the uncertainty in these estimates. Our approach improves the retrieval of tall canopies with typically high carbon stocks. According to our map, only 5% of the global landmass is covered by trees taller than 30 m. Further, we find that only 34% of these tall canopies are located within protected areas. Thus, the approach can serve ongoing efforts in forest conservation and has the potential to foster advances in climate, carbon and biodiversity modelling.


Introduction
As our society depends on a multitude of terrestrial ecosystem services (Manning et al., 2018), the conservation of the Earth's forests has become a priority on the global political agenda (United Nations).To ensure sustainable development through biodiversity conservation and climate change mitigation, the United Nations have formulated global forest goals that include maintaining and enhancing global carbon stocks and increasing forest cover by 3% between 2017 and 2030 (United Nations).Yet global demand for commodities is driving deforestation, impeding progress towards these ambitious goals (Hoang and Kanemoto, 2021).Earth observation and satellite remote sensing play a key role in this context, as they provide the data to monitor the quality of forested area at global scale (Hansen et al., 2013).However, to measure progress in terms of carbon and biodiversity conservation, novel approaches are needed that go beyond detecting forest cover and can provide consistent information about morphological traits predictive of carbon stock and biodiversity (Skidmore et al., 2021), at global scale.One key vegetation characteristic is canopy height (Skidmore et al., 2021;Jucker et al., 2017).
In this work, we describe a deep learning framework to map canopy top height globally with high resolution, using publicly available optical satellite images as input.We deploy that model to compute the first global canopy top height product with 10 m ground sampling distance, based on Sentinel-2 optical images for the year 2020.That global map and underlying source code and models, are made publicly available to support conservation efforts as well as science in disciplines such as climate, carbon, and biodiversity modelling. 1apping canopy height in a consistent fashion at global scale is key to understand terrestrial ecosystem functions, which are dominated by vegetation height and vegetation structure (Migliavacca et al., 2021).Canopy top height is an important indicator of biomass and the associated, global aboveground carbon stock (Duncanson et al., 2022).At high spatial resolution, canopy height models (CHM) directly characterize habitat heterogeneity (Tuanmu and Jetz, 2015), which is why canopy height has been ranked as a high-priority biodiversity variable to be observed from space (Skidmore et al., 2021).Furthermore, forests buffer microclimate temperatures under the canopy (De Frenne et al., 2019).While it has been shown that in the tropics higher canopies provide a stronger dampening effect on microclimate extremes (Jucker et al., 2018), targeted studies are needed to see if such relationships also hold true at global scale (De Frenne et al., 2019).Thus, a homogeneous high-resolution CHM has the potential to advance the modelling of climate impact on terrestrial ecosystems, and may assist forest management to bolster microclimate buffering, as a mitigation service to protect biodiversity under a warmer climate (De Frenne et al., 2019).
Given forests' central relevance to life on our planet, several new space missions have been developed to measure vegetation structure and biomass.A key mission is NASA's GEDI campaign, which has been collecting full-waveform LIDAR data explicitly for the purpose of measuring vertical forest structure globally, between 51.6 • north and south (Dubayah et al., 2020).GEDI has unique potential to advance our understanding of the global carbon stock, but its geographical range, and also its spatial and temporal resolutions, are limited.The mission length, initially set to two years, does not allow for continuous forest monitoring into the future.Moreover, GEDI is a sampling mission expected to cover at most 4% of the land surface.By design, the collected samples sparsely cover the surface of the Earth, which restricts the resolution of wall-to-wall mission products to a 1 km grid (Dubayah et al., 2020).In contrast, satellite missions such as Sentinel-2 or Landsat, which have been designed for a broader range of Earth observation needs, deliver freely accessible archives of optical images that are not as tailored to vegetation structure, but offer longer-term global coverage at high spatial and temporal resolution.Sensor fusion between GEDI and multi-spectral optical imagery has the potential to overcome the limitations of each individual data source (Valbuena et al., 2020).
However, estimating forest characteristics like canopy height or biomass from optical images is a challenging task (Gibbs et al., 2007), as the physical relationships between spectral signatures and vertical forest structure are complex and not well understood (Rodríguez-Veiga et al., 2017).Given the vast amount of data collected by the GEDI mission, we circumvent this lack of mechanistic understanding by harnessing supervised machine learning, in particular end-to-end deep learning.From millions of data examples, our model learns to extract patterns and features from raw satellite images which are predictive of high-resolution vegetation structure.By fusing GEDI observations with Sentinel-2 images, our approach enhances the spatial and temporal resolution of the CHM, and extends its geographic range to the sub-arctic and arctic regions outside of GEDI's coverage.While retrieval of vegetation parameters with deep learning has been demonstrated regionally, and up to country scale (Lang et al., 2019;Becker et al., 2021;Lang et al., 2021b;Rodríguez et al., 2021), we scale it up and process the entire global landmass.This step presents a technical challenge but is crucial to enable operational deployment and ensure consistent, globally homogeneous data.

Deep learning approach
Deep learning is revolutionizing fields ranging from medicine (Jumper et al., 2021) to weather forecasting (Ravuri et al., 2021) and has great potential to advance environmental monitoring (Reichstein et al., 2019;Tuia et al., 2022), but its application to global remote sensing is technically challenging due to the large data volume (Reichstein et al., 2019;Kattenborn et al., 2021).Cloud platforms like Google Earth Engine (Gorelick et al., 2017) simplify the analysis of satellite data but provide a limited set of traditional machine learning tools that depend on manual feature design.To use them, one must sacrifice some flexibility in terms of methods, in return for easy access to large data archives and compute power.In particular, canopy height estimation with existing standard tools tends to struggle with the underestimation of tall canopies, as the height estimates saturate around 25 to 30 m (Hansen et al., 2016;Potapov et al., 2021;Healey et al., 2020).This is a fairly severe limitation in regions dominated by tall canopies, like tropical forests, and deteriorates downstream carbon stock estimation, since tall trees have especially high biomass (Jucker et al., 2017).A further restriction of prior large-scale CHM projects is that they rely on local calibration, which hampers their use in locations without nearby reference data (Potapov et al., 2021;Healey et al., 2020).Technically, existing mapping schemes aggregate reflectance data over time, but perform pure pixel-to-pixel mapping without regard to local context and image texture.
Here, we extend previous regional deep learning methods (Lang et al., 2019;Becker et al., 2021;Lang et al., 2021b) to a global scale.These methods have been shown to mitigate the saturation of tall canopies by exploiting texture, while not depending on temporal features (Lang et al., 2019).In more detail, our approach employs an ensemble (Lakshminarayanan et al., 2017) of deep, (fully) convolutional neural networks (CNN), each of which takes as input a Sentinel-2 optical image and transforms it into a dense canopy height map with the same ground sampling distance (GSD) of 10 m (Lang et al., 2019).See Fig. 1a.Our unified, global model is trained with sparse supervision, using reference heights at globally distributed GEDI footprints, derived from the raw waveforms (Lang et al., 2022).A dataset of 600 million samples is constructed by extracting Sentinel-2 image patches of 15×15 pixels An important goal of our work are low estimation errors for tall vegetation.To that end we extend the CNN model in three ways (Fig. 1b).First, we equip the model with the ability to learn geographical priors, by feeding it geographical coordinates (in a suitable cyclic encoding) as additional input channels (Fig. 1a).Second, we employ a fine-tuning strategy where the sample loss is re-weighted inversely proportional to the sample frequency per 1-meter height interval, so as to counter the bias in the reference data towards low canopies (which reflects the long-tailed world-wide height distribution, where low vegetation dominates, and high values are comparatively rare).Finally, we train an ensemble of CNNs and aggregate predictions from repeated observations of the same location, which reduces the underestimation of tall canopies even further.The combination of all three measures yields the best performance.The average root mean square error (aRMSE) of the height estimates, balanced across all 5-meter height intervals, is 7.3 m, and the average mean error (aME, i.e. bias) is -1.8 m (Fig. 1b).The overall RMSE over all validation samples (without height balancing) is 6.0 m, with a bias of 1.3 m (Fig. 1c).The latter is due to a slight overestimation of low canopy heights, and is the price we pay for improving the performance on rare tall canopies (Fig. 1b).
A biome-level analysis based on the 14 biomes defined by The Nature Conservancy 2 shows how the bias varies across biomes (Fig. 1d,Extended Data Fig. A.5).The model is able to correctly identify bare soil in deserts with zero height, with marginal error and no bias.The bias is also low in montane, temperate, and tropical grasslands, as well as in Mediterranean and tropical dry broadleaf forest, but higher in flooded grasslands.The most severe overestimation, on average ≈2.5 m is observed for mangroves, tundra and tropical coniferous forests.The highest spread of residuals is observed in tropical and temperate coniferous forests as well as in the tundra, where we note that the latter is significantly underrepresented in the dataset, as GEDI's range does not extend beyond 51.6 • north.Furthermore, the GEDI reference data in these tundra regions (southern part of Kamchatka Mountain and Forest Tundra & Trans-Baikal Bald Mountain Tundra) appears rather noisy and contaminated with a significant number of outliers (Extended Data Fig. A.5).
We additionally report an evaluation of our final model against independent reference data from NASA's LVIS airborne LIDAR campaigns (Blair, 2018) (Extended Data Fig. A.2), which were designed to deliver canopy top heights comparable to GEDI (Dubayah et al., 2020).Across all 12 available LVIS areas, our model yields an RMSE of 7.8 m and a ME of 0.6 m, i.e., the overall RMSE is slightly higher than the one for held-out GEDI samples, whereas the bias is lower.When evaluating only on the areas outside (north of) the GEDI range, we obtain 4.7 m RMSE and 1.6 m ME, whereas only the remaining areas within the GEDI range yield 8.8 m RMSE and 0.2 m ME (recall that the magnitude of the error correlates with the height).In other words, our estimates agree well with LVIS even outside of the geographical range of the training data.In the appendix Appendix A.1, we additionally compare against another global canopy height product based on Landsat image composites (Potapov et al., 2021).

Modelling predictive uncertainty
While deep learning models often exhibit high predictive skill and produce estimates with low overall error, the uncertainty of those estimates is typically not known, or unrealistically low (i.e., the model is over-confident) (Guo et al., 2017).But reliable uncertainty quantification is crucial to inform downstream investigations and decisions based on the map content (Duncanson et al., 2022), e.g., it can indicate which estimates are too uncertain and should be disregarded (Lang et al., 2022).To afford users of our CHM a trustworthy, spatially explicit estimate of the map's uncertainty, we integrate probabilistic deep learning techniques.These methods are specifically designed to quantify also the predictive uncertainty, taking into account on the one hand the inevitable noise in the input data; and on the other hand the uncertainty of the model parameters, resulting from the lack of reference data for certain conditions (Kendall and Gal, 2017).In particular, we independently train five deep CNNs that have identical structure, but are initialized with different random weights.The spread of the predictions made by such a model ensemble (Lakshminarayanan et al., 2017) for the same pixel is an effective way to estimate model uncertainty (a.k.a.epistemic uncertainty), even with small ensemble size (Ovadia et al., 2019).Each individual CNN is trained by maximizing the Gaussian likelihood, rather than minimizing the more widely used squared error.Consequently, each CNN outputs not only a point-estimate per pixel, but also an associated variance that quantifies the uncertainty of that estimate (a.k.a.its aleatoric uncertainty) (Kendall and Gal, 2017).
During inference, we process images from ten different dates (satellite overpasses) within a year at every location to obtain full coverage, and exploit redundancy for pixels with multiple cloud-free observations.Each image is processed with a randomly selected CNN within the ensemble, which reduces computational overhead and can be interpreted as natural test-time augmentation, known to improve the calibration of uncertainty estimates with deep ensembles (Ashukha et al., 2020).
Finally, we use the estimated aleatoric uncertainties to merge redundant predictions from different imaging dates, by weighted averaging proportional to the inverse variance.While inverse variance weighting is known to yield the lowest expected error (Strutz, 2010), we observe a deterioration of the uncertainty calibration for low values (<4 m standard deviation).We also note that uncertainty calibration varies per biome (Extended Data Fig. A.6), so it may be advisable to re-calibrate in post-processing depending on the intended application and region of interest.Despite these observations, the estimated predictive uncertainty correlates well with the empirical estimation error and can therefore be used to filter out inaccurate predictions, thus lowering the overall error at the cost of reduced completeness (Extended Data Fig. A.4). E.g., by filtering out the 20% most uncertain canopy height estimates, over- all RMSE is reduced by 13% (from 6.0 m to 5.2 m) and the bias is reduced by 23% (from 1.3 m to 1.0 m).

Global canopy height mapping
The model has been deployed globally on the Sentinel-2 image archive for the year 2020 to produce a global map of canopy top height.To cover the global landmass (≈1.3×10 12 pixels at the GSD of Sentinel-2), a total of ≈160 terabyte of Sentinel-2 image data are selected for processing.This required ≈27,000 GPU-hours (≈3 GPU-years) of computation time, parallelized on a highperformance cluster to obtain the global map in 10 days real time.
The new wall-to-wall canopy height product at 10 m GSD makes it possible to gain insights into the fine-grained distribution of canopy heights anywhere on Earth, as well as the associated uncertainty (Fig. 2).Three example locations (A-C in Fig. 2) demonstrate the level of canopy detail that the map reveals, ranging from harvesting patterns from forestry in Washington State, US (A), through gallery forests along permanent rivers and ground water in the forest-savanna of Northern Cameroon (B), to dense tropical broadleaf forest in, Borneo, Malaysia (C).Also at large scale, the predictive uncertainty is positively correlated with the estimated canopy height (Fig. 2b).Still, some regions with comparatively low canopy height, like Alaska, northwestern Canada, and Tibet exhibit high predictive uncertainty.The two former lie outside of the GEDI coverage, so the higher uncertainty is likely due to local characteristics that the model has not encountered during training.The latter indicates that, also within the GEDI range, there are environments that are more challenging for the model, e.g., due to globally rare ecosystem characteristics not encountered elsewhere, or frequent cloud cover.
Our new dataset enables a full, worldwide assessment of coverage of the global landmass with vegetation.Doing this for a range of thresholds recovers an estimate of the global canopy height distribution (for the year 2020, Fig. 3a and Extended Data Fig.A.7a).We find that an area of 53.6×10 6 km 2 (41% of the global landmass) is covered by vegetation with >5 m canopy height, 39.6×10 6 km 2 (30%) by vegetation >10 m, and 6.7×10 6 km 2 (5%) by vegetation >30 m.See Fig. 3b.
We see that protected areas (according to the World Database on Protected Areas, WDPA (UNEP-WCMC and IUCN, 2021)) tend to contain higher vegetation compared to the global average (Fig. 3a).Furthermore, 34% of all canopies >30 m fall into protected areas (Fig. 3a).See Extended Data Fig.A.7b for examples of protected areas that show good agreement with mapped canopy height patterns.This analysis highlights the relevance of the new dataset for ecological and conservation purposes.For instance, canopy height and its spatial homogeneity can serve as an ecological indicator to identify forest areas with high integrity and conservation value.That task requires both dense area coverage at reasonable resolution, and a high saturation level to locate very tall vegetation.
Finally, our new map makes it possible to analyze the exhaustive distributions of canopy heights at the biome level, revealing characteristic frequency distributions and trends within different types of terrestrial ecosystems (Fig. 3b).While, for instance, the canopy heights of tropical moist broadleaf forests follow a bimodal distribution with a mode >30 m, mangroves have a unimodal distribution with a large spread and heights ranging up to >40 m.Notably, our model has learned to predict reasonable canopy heights in tundra regions, despite scarce and noisy reference data for that ecosystem.

Discussion
There are at least two major downstream applications that the new high-resolution canopy height dataset can help to advance at global scale, namely biomass and biodiversity modelling.Furthermore, our model can support monitoring of forest disturbances.Canopy top height is a key indicator to study the global aboveground carbon stock stored in form of biomass (Duncanson et al., 2022).On a local scale, we compare our canopy height map with dense aboveground carbon density data (Asner et al., 2021) that was produced by a targeted airborne LIDAR campaign in Sabah, northern Borneo (Asner et al., 2018) (Fig. 4a).We observe that for natural tropical forests, the spatial patterns agree well and that our canopy height estimates from Sentinel-2 are predictive of carbon density even in tropical regions, with canopy heights up to 65 m.Notably, the relationship between carbon and canopy height is sensitive up to ≈60 m.
We further demonstrate that our model can be deployed annually to map canopy height change over time, e.g., to derive changes in carbon stock and estimate carbon emissions caused by global land-use changes, at present mainly deforestation (Friedlingstein et al., 2021).Annual canopy height maps are computed for a region in northern California where wildfires have destroyed large areas in 2020 (Fig. 4b).Our automated change map corresponds well with the mapped fire extent from the California Department of Forestry and Fire Protection3 , while at the same time the annual maps are consistent in areas not affected by the fires, where no changes are expected.Such high-resolution change data may potentially help to reduce the high uncertainty of emission estimates that are reported in the annual Global Carbon Budget (Friedlingstein et al., 2021).It is worth mentioning that the presented approach yields reliable estimates based on single cloud-free Sentinel-2 images.Thus, its potential for monitoring changes in canopy height is not limited to annual maps, but to the availability of cloud-free images, that are taken at least every 5 days globally.This high update frequency makes it relevant for e.g.real-time deforestation alert systems, even in regions with frequent cloud cover.
A second line of potential applications includes biodiversity modelling, as the high spatial resolution of the canopy height model brings about the possibility to characterize habitat structure and vegetation heterogeneity, known to promote a number of ecosystem services (Felipe-Lucia et al., 2018) and to be predictive of biodiversity (MacArthur and MacArthur, 1961;Tews et al., 2004).The relationship between heterogeneity and species diversity is founded in niche theory (MacArthur and MacArthur, 1961;Tews et al., 2004), which suggests that heterogeneous areas provide more ecological niches for different species to co-exist.Our dense map makes it possible to study second-order homogeneity (Tuanmu and Jetz, 2015) (which is not easily possible with sparse data like GEDI), and down to a length scale of 10 to 20 m.
Technically, our wall-to-wall map makes it a lot easier for scientists to intersect sparse sample data, e.g., field plots, with canopy height.To make full use of scarce field data in biomass or biodiversity research, dense complementary maps are a lot more useful: when pairing sparse field samples with other sparsely sampled data, the chances of finding enough overlap are exceedingly low; whereas pairing them with low-resolution maps risks biases due to the large scale difference and associated spatial averaging.
We note that, despite the nominal 10 m ground sampling distance of our global map, the effective spatial resolution at which individual vegetation features can be identified is lower.As a consequence of the GEDI reference data used to train the model, each map pixel effectively indicates the largest canopy top height within a GEDI footprint (≈25 m diameter) centred at the pixel.Two subtle reasons further impact the effective resolution, compared to a map learned from dense, local reference data (e.g., airborne LIDAR (Lang et al., 2019)): the sparse supervision means that the model never explicitly sees high-frequency variations of the canopy height between adjacent pixels; and small misalignments caused by the geolocation uncertainty of the GEDI footprints (Dubayah et al., 2020;Roy et al., 2021) introduce further noise.
While at present we do not see this as severely limiting the utility of the map, in the future one could consider extending the method with techniques for guided superresolution (de Lutio et al., 2019), in order to better preserve small features visible in the raw Sentinel-2 images, like canopy gaps.
Regarding map quality, besides minor artifacts in regions with persistent cloud cover, we observe tiling artifacts at high latitudes in the northern hemisphere.The systematic inconsistencies at tile borders point at degradation of the absolute height estimates, possibly caused by a lack of training data for particular, locally unique vegetation structures.Interestingly, it appears that a significant part of these errors are constant offsets between the tiles.Further investigations are needed to ascertain how strongly differential quantities like homogeneity are affected by those artifacts.

Conclusion
We have developed a deep learning method for canopy height retrieval from Sentinel-2 optical images.That method has made it possible to produce the first globally consistent, wall-to-wall canopy top height map of the entire Earth, with 10 m ground sampling distance.Besides Sentinel-2, the GEDI LIDAR mission also plays a key role, as the source of sparse, but extremely well-distributed reference data at global scale.Compared to previous work that maps canopy height at global scales (Potapov et al., 2021), our model substantially reduces the overall underestimation bias for tall canopies.Our model does not require local calibration, and can therefore be deployed anywhere on Earth, including regions outside the GEDI coverage.Moreover, it also delivers globally homogeneous estimates for the predictive uncertainties of the retrieved canopy heights.As our method, once trained, only needs image data, maps can be updated annually, opening up the possibility to track the progress of commitments, made under the United Nation's global forest goals, to enhance carbon stock and forest cover by 2030 (United Nations).At the same time, the longer the GEDI mission will collect on-orbit data, the denser the reference data for our approach will become, which can be expected to diminish the predictive uncertainty of its predictions.
As a possible future extension, our model could be extended to map other vegetation characteristics (Becker et al., 2021) at global scale.In particular, it appears feasible to densely map biomass, by retraining with GEDI L4A biomass data (Duncanson et al., 2022), or by adding additional data from planned future space missions (Le Toan et al., 2018).
While deep learning technology for remote sensing is continuously being refined by focusing on improved performance at regional scale, its operational utility has been limited by the fact that it often could not be scaled up to global coverage.Our work demonstrates that, if one has a way of collecting globally well-distributed reference data, modern deep learning can be scaled up and employed for global vegetation analysis from satellite images.We hope that our example may serve as a foundation on which new, scalable approaches can be built that harness the potential of deep learning for global environmental monitoring; and that it inspires machine learning researchers to contribute to environmental and conservation challenges.

Data
This work builds on data from two ongoing space missions: the Copernicus Sentinel-2 mission operated by the European Space Agency (ESA) and NASA' Global Ecosystem Dynamics Investigation (GEDI).The Sentinel-2 multispectral sensor delivers optical images covering the global landmass with a revisit time of at most 5 days on the equator.We use the atmospherically corrected L2A product, consisting of 12 bands ranging from the Visible (RGB) and Near Infra-Red (NIR) to the Short Wave Infra-Red (SWIR).While the RGB and NIR bands have 10 m GSD, the other bands have a 20 m or 60 m GSD.For our purposes all bands are upsampled with cubic interpolation to obtain a data-cube with 10 m ground sampling distance.The GEDI mission is a space-based full-waveform LIDAR mounted on the International Space Station, and measures vertical forest structure at sample locations with 25 m footprint, distributed between 51.6 • north and south.We use footprint-level canopy top height data derived from these waveforms (Lang et al., 2022(Lang et al., , 2021a) ) as sparse reference data.The canopy top height is defined as RH98, the relative height at which 98% of the energy has been returned, and was derived from GEDI L1B waveforms collected between April and August in the years 2019 and 2020.
To train the deep learning model, a global training dataset has been constructed within the GEDI range, by combining the GEDI data and the Sentinel-2 imagery.For every Sentinel-2 tile, we select the image with the least cloud coverage between May and September 2020.Image patches of 15×15 pixels (i.e., 150 m ×150 m on the ground) are extracted from these images at every GEDI footprint location.The GEDI data is rastered to the Sentinel-2 pixel grid by setting the canopy height reference value of the pixel that corresponds to the center of the GEDI footprint.Locations for which the image patch is cloudy or snow-covered are filtered out from the dataset.To correct noise injected by the geolocation uncertainty of the GEDI footprints, we use the Sentinel-2 L2A scene classification and assign 0 m canopy height to footprints located in the categories "not vegetated" or "water".This procedure also addresses the slight positive bias due to slope in the GEDI reference data (Lang et al., 2022).Overall, the resulting dataset contains 600×10 6 samples globally distributed within the GEDI range.All samples located within 20% of the Sentinel-2 tiles in that range (each 100 × 100 km) are set aside as validation data (See Extended Data Fig. A.1).
A second evaluation is carried out w.r.t.canopy top heights independently derived from airborne LIDAR data acquired by NASA's Land, Vegetation, and Ice Sensor (LVIS).LVIS is a large-footprint full-waveform LIDAR, from which the LVIS L2 height metric RH98 is rastered to the Sentinel-2 10 m grid.Locations where LVIS data is available are visualized in Extended Data Fig.A.2.

Deep fully convolutional neural network
Our model is based on the fully convolutional neural network architecture proposed in (Lang et al., 2019).The architecture employs a series of residual blocks with separable convolutions (Chollet, 2017), without any downsampling within the network.The sequence of learnable 3×3 convolutional filters is able to extract not only spectral, but also textural features.To speed-up the model for deployment at global scale we reduce its size, setting the number of blocks to 8 and the number of filters per block to 256.This speeds up the forward pass by a factor of ≈17 compared to the original, larger model.In our tests, the smaller version did not cause higher errors in an early phase of training.When trained long enough, a larger model with higher capacity may be able to reach lower prediction errors, but the higher computational cost of inference would limit its utility for repeated, operational use.The model takes the 12 bands from Sentinel-2 L2A product and the cyclic encoded geographical coordinates per pixel as input, for a total of 15 input channels.Its output are two channels with the same spatial dimension as the input, one for the mean height and one for its variance.See Fig. 1.Since the architecture is fully convolutional, it can process arbitrarily sized input image patches, which is useful when deploying it at large scale.

Model training with sparse supervision
Formally, canopy height retrieval is a pixel-wise regression task.We train the regression model end-to-end in supervised fashion, which means that the model learns to transform raw image data into spectral and textural features predictive of canopy height, and there is no need to manually design feature extractors.We train the convolutional neural network with sparse supervision, i.e., by selectively minimizing the loss (Eq. 1) only at pixel locations for which there is a GEDI reference value.Before feeding the 15-channel data cube to the CNN, each channel is normalized to be standard normal, using the channel statistics from the training set.The reference canopy heights are normalized in the same way, a common preprocessing step to improve the numerical stability of the training.Each neural network is trained for 5,000,000 iterations with a batch size of 64, using the ADAM optimizer (Kingma and Ba, 2015).The base learning rate is initially set to 0.0001 and then reduced by factor 0.1 after 2'000'000 iterations and again after 3,500,000 iterations.This schedule was found to stabilizes the uncertainty estimation.

Modelling the predictive uncertainty
Modelling uncertainty in deep neural networks is challenging due to their strong non-linearity, but crucial to build trustworthy models.The approach followed in this work accounts for two sources of uncertainty, namely the data (aleatoric) and the model (epistemic) uncertainty (Kendall and Gal, 2017).The uncertainty in the data, resulting from noise in the input and reference data, is modeled by minimizing the Gaussian negative log likelihood (Eq. 1) as a loss function (Kendall and Gal, 2017).This corresponds to independently representing the model output at every pixel i as a conditional Gaussian probability distribution over possible canopy heights, given the input data, and estimating the mean μ and variance σ2 of that distribution.
To account for the model uncertainty, which in highcapacity neural network models can be interpreted as the model's lack of knowledge about patterns not adequately represented in the training data, we train an ensemble (Lakshminarayanan et al., 2017) of 5 CNNs from scratch, i.e., each time starting the training from a different randomly initialized set of model weights (learnable parameters).At inference time we process images from T different acquisition dates (here T = 10) for every location, to obtain full coverage and to exploit redundancy in the case of repeated cloud-free observations of a pixel.Each image is processed with one CNN picked randomly from the ensemble.This procedure incurs no additional computational cost compared to processing all images with the same CNN.It can be interpreted as a natural variant of test-time augmentation, which has been demonstrated to improve the calibration of uncertainty estimates from deep ensembles in the domain of computer vision (Ashukha et al., 2020).Finally, the per-image estimates are merged into a final map by averaging with inverse-variance weighting (Eq.3).If the variance estimates of all ensemble members are well calibrated, this results in the lowest expected error (Strutz, 2010).Thus, the variance of the final perpixel estimate is computed with the weighted version of the law of total variance (Eq.4) (Kendall and Gal, 2017).For readability we omit the pixel index i.

Correction for imbalanced height distribution
We find that the underestimation bias on tall canopies is partially due to the imbalanced distribution of reference labels (and canopy heights overall), where large height values occur rarely.To mitigate it, we fine-tune the converged model with a cost-sensitive version of the loss function.A softened version of inverse sample-frequency weighting is used to re-weight the influence of individual samples on the loss (Eq.5).To establish the frequency distribution of the continuous canopy height values in the training, we bin them into 1 m height intervals, and in each of the resulting K bins count the number of samples N k .Empirically, for our task the moderated reweighting with the square root of the inverse frequency works better (leaving all other hyperparameters unchanged).Moreover, we do not fine-tune all model parameters, but only the final regression layer that computes mean height (see Fig. 1a).We observe that the uncertainty calibration is preserved when fine-tuning only the regression weights for the mean ("S2+geo balanced: mean" in Extended Data Fig.A.3), whereas fine-tuning also the regression of the variance decalibrates the uncertainty estimation ("S2+geo balanced: mean&var").The fine-tuning is run for 750,000 iterations per network.

Evaluation metrics
Several metrics are employed to measure prediction performance: the root mean square error (RMSE, Eq. 6) of the predicted heights, their mean absolute error (MAE, Eq. 8), and their mean error (ME, Eq. 7).The latter quantifies systematic height bias, where a negative ME indicates underestimation, i.e., predictions that are systematically lower than the reference values.

RMSE
We also report balanced versions of these metrics, where the respective error is computed separately in each 5 m height interval and then averaged across all intervals.They are abbreviated as aRMSE, aMAE, and aME (Fig. 1b).
For the estimated predictive uncertainties, there are by definition no reference values.A common scheme to evaluate their calibration is to produce calibration plots (Guo et al., 2017;Laves et al., 2020) that show how well the uncertainties correlate with the empirical error.As this correlation holds only in expectation, both the uncertainties and the empirical errors at the test samples must be binned into K equally sized intervals.In each bin B k , the average of the predicted uncertainties is then compared against the actual average deviation between the predicted height and the reference data.Based on the calibration plots, it is further possible to derive a scalar error metric for the uncertainty calibration, the uncertainty calibration error (UCE) (Eq.9) (Laves et al., 2020).Again, we additionally report a balanced version, the average uncertainty calibration error (AUCE) (Eq.10), where each bin has the same weight independent of the number N k of samples in it.

UCE =
In our case err(B k ) represents the RMSE of the samples falling into bin B k , and the bin uncertainty uncert(B k ) is defined as the root mean variance (RMV): with ûi = σ2 i when evaluating the calibration of a single CNN, and ûi = Var(ŷ i ) when evaluating the calibration of the ensemble.We refer to the RMV as the predictive standard deviation in our calibration plots (Extended Data Fig. A.3,Extended Data Fig. A.6).

Global map computation
Sentinel-2 imagery is organized in 100 km×100 km tiles, a total of 18,011 tiles cover the entire landmass of the Earth, excluding Antarctica.However, depending on the ground tracks of the satellites, some tiles are covered by multiple orbits, whereas in general no more than 2 orbits are need to get full coverage.To optimize computational overhead, we select the relevant orbits per tile by using those with the smallest number of empty pixels, according to the metadata.For every tile and relevant orbit the 10 images with least cloud cover between May and September 2020 are selected for processing.
While it only takes ≈2 minutes to process a single image tile with the CNN on a GeForce RTX 2080 Ti GPU, downloading the image from the Amazon Web Service S3 bucket takes about 1 minute, and loading the data into memory takes about 4 minutes.To process a full tile with all 10 images per orbit takes between 1 and 2.5 hours, depending on the number of relevant orbits (1 or 2).
We only apply minimal post-processing and mask out built-up areas, snow, ice and permanent water bodies according to the ESA World Cover classification (Zanaga et al., 2021), setting their canopy height to "no data" (value: 255).The canopy height product is released in the form of 3 • × 3 • tiles in geographic longitude/latitude, following the format of the recent ESA World Cover product.This choice shall simplify the integration of our map into existing workflows, as well as the intersection of the two products.Note that the statistics in the present paper were not computed from those tiles, but in Gall-Peters orthographic equal-area projection with 10 m GSD, for exact correspondence between pixel counts and surface areas.

Energy and carbon emission footprint
The presented map has been computed on a GPU cluster located in Switzerland.Carbon accounting for electricity is a complex endeavour, due to differences how electricity is produced and distributed.To put the power consumption needed to produce global maps with our method into context, we estimate carbon emissions for two scenarios, where the computation is run on AWS in two different locations, EU (Stockholm) and US East (Ohio).With ≈250 W to power one of our GPUs, we get a total energy consumption of 250 W×27000 h = 6750 kWh for the global map.The conversion to emissions highly depends on the carbon efficiency of the local power grid.For EU (Stockholm) we obtain an estimated 338 kg CO 2equivalent, whereas for US East (Ohio) we obtain 3848 kg CO 2 -equivalent, a difference by a factor >10.While the former is comparable to driving an average car from Los Angeles to San Francisco and back (1,360 km), the latter corresponds to a round trip from Seattle (US) to San José, Costa Rica (15,500 km).These estimates were conducted using the Machine Learning Impact calculator (Lacoste et al., 2019).For the carbon footprint of the current map (not produced with AWS), we estimate ≈729 kgCO 2 eq, using an average of 108 gCO 2 eq/kWh for Switzerland, as reported for the year 2017 (Rüdisüli et al., 2022).

Data and code availability
The global canopy height map for 2020 will be released for download and will be available in the Google Earth Engine.
All links, source code, and the trained models used to generate the map will be released upon publication via the project page: langnico.github.io/globalcanopyheight/.The global map can be explored interactively in this browser application: nlang.users.earthengine.app/view/global-canopy-height-2020.

Figure 1 :
Figure 1: Model overview and global model evaluation on held-out GEDI reference data.a) Illustration of the model training process with sparse supervision from GEDI LIDAR.The CNN takes the Sentinel-2 image and encoded geographical coordinates as an input to estimate dense canopy top height as well as its predictive uncertainty (variance).b) Residual analysis w.r.t.canopy height intervals, and ablation study of model components.Negative residuals indicate that estimates are lower than reference values.c) Confusion plot for the final model ensemble, showing good agreement between predictions from Sentinel-2 and GEDI reference.d) Biome-level analysis of final ensemble estimates: GEDI reference height, residuals, and number of samples per biome.

Figure 2 :
Figure 2: Global canopy height map for the year 2020, visualized in Equal-Earth projection.The underlying data product, estimated from Sentinel-2 imagery, has 10 m ground sampling distance.a) Canopy top height.b) Predictive standard deviation of canopy top height estimates.

Figure 3 :
Figure 3: Global canopy height distributions of the entire landmass, protected areas, and biomes.a) Frequency distribution and cumulative distribution for the entire global landmass and within protected areas (according to WDPA (UNEP-WCMC and IUCN, 2021)), and fraction of vegetation above a certain height that is protected.b) Biome-level frequency distribution of canopy heights according to 14 terrestrial ecosystems defined by The Nature Conservancy.Urban areas and croplands (based on ESA World Cover (Zanaga et al., 2021)) have been excluded.

Figure 4 :
Figure 4: Examples for potential applications.a) Biomass and carbon stock mapping.In Sabah, northern Borneo, canopy height estimated from Sentinel-2 optical images correlates strongly with aboveground carbon density measured by a targeted airborne LIDAR campaign (Asner et al., 2018) (Lat: 5.3812, Lon: 117.0748).b) Monitoring environmental damages.In 2020, wildfires destroyed large areas of forests in northern California.The difference between annual canopy height maps is in good agreement with the wildfire extent mapped by the California Department of Forestry and Fire Protection (Lat: 40.1342,Lon: -123.5201).
Figure A.1: Geographical error analysis on held-out GEDI validation data at 1 degree resolution (≈111 km on the equator).a) Root mean square error (RMSE).b) Mean absolute error (MAE).c) Mean error (ME), where negative ME mean an underestimation bias when the predictions are lower than the reference values.
Figure A.2: Evaluation w.r.t.canopy top height (RH98) derived from independent LVIS airborne LIDAR data (Blair, 2018).a) Locations of LVIS LIDAR campaigns.a-c) Confusion plots showing the relationship between LVIS reference data and predictions from Sentinel-2 for a) All available LVIS areas, b) Only regions within the GEDI range, and c) Only regions north of GEDI.

Figure A. 3 :
Figure A.3: Calibration plot showing the relationship between the estimated predictive uncertainty and the empirical error.

Figure A. 4 :
Figure A.4: Improvement of overall error metrics when filtering out the most uncertain canopy height predictions with the help of the estimated predictive uncertainty.

Figure
Figure A.5: Biome-level confusion plots, showing the relationship between GEDI reference data and predictions from Sentinel-2.

Figure A. 6 :
Figure A.6: Biome-level calibration plots, showing the relationship between the estimated predictive uncertainty and the empirical error.