Evaporative water loss of 1.42 million global lakes

The evaporative loss from global lakes (natural and artificial) is a critical component of the terrestrial water and energy balance. However, the evaporation volume of these water bodies—from the spatial distribution to the long-term trend—is as of yet unknown. Here, using satellite observations and modeling tools, we quantified the evaporation volume from 1.42 million global lakes from 1985 to 2018. We find that the long-term average lake evaporation is 1500 ± 150 km3 year−1 and it has increased at a rate of 3.12 km3 year−1. The trend attributions include an increasing evaporation rate (58%), decreasing lake ice coverage (23%), and increasing lake surface area (19%). While only accounting for 5% of the global lake storage capacity, artificial lakes (i.e., reservoirs) contribute 16% to the evaporation volume. Our results underline the importance of using evaporation volume, rather than evaporation rate, as the primary index for assessing climatic impacts on lake systems.

C overing about 5 million km 2 of the Earth's land area, lakes (natural and artificial) are key components of global ecological and hydrological systems [1][2][3][4] . Lakes support aquatic and terrestrial biodiversity, and are an important water resource for humans 5,6 . Due to their large open water areas-and the strong vapor pressure gradient at the water-atmosphere interface -lakes can lose a massive volume of water through evaporation (i.e., latent heat flux) 7,8 . The dynamics of lake evaporative water loss depend on water area and evaporation rate, both of which vary by geographical location and are sensitive to the manifestations of a complex changing environment 9 . For instance, evaporation rate can be altered by warming temperatures 10,11 and by elevated solar radiation 8 , while open water areas can increase from shrinking lake ice cover 12 or decrease from extreme drought conditions 13 . Thus, it is crucial to understand the spatiotemporal changes and drivers of evaporative water loss from lakes for better aquatic ecosystem and water resources management.
However, due to a dearth of reliable, globally consistent, and locally practical datasets, evaporative water loss has not been quantified at a global scale. In the past, the accuracy of open water area and evaporation rate estimations has been hindered by various challenges such as cloud contamination of satellite images 14,15 , lake heat storage quantification 16 , and lake ice duration modeling 17,18 . The existing global studies have mostly focused on evaporation rate changes (solely) 11,16 , and not on the overall evaporation volume. However, without factoring in the lake area dynamics and the lake ice freeze/thaw cycles, the evaporation rate alone cannot represent the magnitude of lake water loss. These approaches are thus inadequate for lake water/energy balance assessment and water resources management. Several local to regional studies have provided more reliable estimates 7,8,19 , but extrapolating these results globally is inappropriate due to the large spatial heterogeneity of evaporation rate and water area. Consequently, the roles of global lakes within climate systems cannot be fully evaluated, as evaporation is the sole process linking energy exchanges and water cycles.
Here, we present the first global lake evaporation volume (GLEV) dataset, which contains the monthly evaporative water loss information of 1.42 million lakes (≥10 5 m 2 ) from 1985 to 2018. These lakes include both natural lakes and artificial lakes (referred to as "reservoirs" hereafter). For each lake and each month, the evaporation volume (V E ) was calculated as a function of the evaporation rate (E lake ), the lake surface area (A s ), and the fraction of ice duration (f d,ice ). In particular, the heat storage changes for lakes-which have been typically overlooked due to the complexities and difficulties in considering them-were quantified to improve the accuracy of the evaporation rate estimation 8 . The monthly lake surface areas were reconstructed using a Landsat-based global surface water dataset 14 , and the monthly fractions of ice duration were modeled using air temperature and freeze/thaw lag information 18 (see Methods). The lake open water areas (A o )-which exclude the lake ice coverwere calculated as the product of lake surface area (A s ) and the fraction of open water duration (i.e., 1-f d,ice ). To assess the longterm trend of each variable, the monthly results were aggregated to annual bases in the subsequent calculations.

Results
Spatial heterogeneity of lake evaporation volume. In total, the annual V E of global lakes (excluding the Caspian Sea) from 1985 to 2018 is estimated to be 1500 ± 150 km 3 (Fig. 1), which is 15.4% higher than the previous model-based estimate (i.e., 1300 km 3 ) 20 . The spatial distribution of V E (Fig. 1a) is mostly linked to the A o distribution, but is also affected by E lake (Supplementary Fig. 1). For example, the five Laurentine Great Lakes and the seven African Great Lakes contribute 8.8% and 15.7% to global V E , while their corresponding total area is 9.1% and 6.2% (of the global total). In the region above 40°N, lakes contribute 62% of global A o , but generate 46% of global V E . The majority (83%) of the V E in this region occurs in the period from June to November (Fig. 1b)-i.e. the summer months in the Northern Hemispherewhich have the least lake ice coverage and the highest evaporation rates. The differences between the A s and A o distributions also confirm that the impact of lake ice on A s is more predominate over the high-latitude smaller lakes (Fig. 1c). The average annual A o for global lakes from 1985 to 2018 is 1.47 × 10 6 km 2 , which is about 63% of A s (2.34 × 10 6 km 2 ), indicating that a large portion of the global lake area is covered by ice. With regard to lake V E , it first increases and then decreases at the same time as E lake increases (Fig. 1d). For lakes with E lake < 1500 mm year −1 , the A o and E lake jointly contribute to the exponentially increasing V E . For lakes with E lake > 1500 mm year −1 , the A o tends to be limited by a drier climate, resulting in a reduction in V E despite the increase of E lake (Supplementary Fig. 1). This also suggests that using the evaporation rate alone cannot accurately represent the impacts of climate change on evaporative water loss due to the spatial mismatch between the distributions of E lake and A o ( Fig. 1d and Supplementary Fig. 1). Instead, V E is the direct metric of such loss, and thus a better index of climate change for purposes of water resources management.
While the total lake surface area only accounts for 1.57% of the global land area, lake evaporation plays an important role in global and regional terrestrial evapotranspiration, as manifested through the V E ⁄ V ET ratio. The land evapotranspiration volume was quantified by multiplying the Moderate Resolution Imaging Spectroradiometer (MODIS) Global Terrestrial Evapotranspiration product (i.e., MOD16A2) to the land surface area, and then the total evapotranspiration volume (V ET ) was derived by adding the lake evaporation volume to the land evapotranspiration volume. In total, lake evaporation contributes 2.37% to the global terrestrial V ET . This V E ⁄ V ET ratio exhibits a large range for different river basins (Fig. 2). Basins with large lake systems or in arid regions tend to have higher percentage values. For example, the Great Lakes Basin in North America has an V E ⁄V ET value of 27%, and the Tigris-Euphrates River Basin has a value of 13%. Conversely, due to the large soil evaporation and vegetation evapotranspiration, humid basins tend to have low percentage values (e.g., the Amazon River Basin has a V E ⁄V ET ratio of 0.5%).
Long-term trends of lake evaporation volume. For most of the nine thermal regions 21 , the lake evaporation volume has increased during the past 34 years due to both an increasing evaporation rate and an increasing lake area ( Fig. 3 and Supplementary Table 1). The E lake values for all of the nine regions show significant increasing trends (p-value < 0.05; Mann-Kendall non-parametric test). Specifically, E lake in the Northern Frigid (NF) region has risen the most-3.7% decade −1 . On average, the global E lake is increasing at a rate of 1.5% decade −1 from 1985 to 2018, primarily due to increasing air temperature and vapor pressure deficit (Supplementary Fig. 2). For A o , most regions have also shown an increasing trend. The interannual variability of A o generally follows the pattern of the regional climatological and hydrological variations 22 . For example, the rapid decrease of A o in the Southern Warm (SW) region from 2000 to 2009 (−18% decade −1 , −1030 km 2 year −1 ) is attributed to the severe Australia Millennium drought 13 . Meanwhile, due to decreasing lake ice associated with warming, the NF region has shown a steady increase of A o (3.8% decade −1 , 760 km 2 year −1 ). The long-term trend patterns of V E over different regions are generally consistent with their A o patterns, but are also modified by their E lake trends. The largest increasing trend (4.5% decade −1 , 0.6 km 3 year −1 ) is again in the NF region, due to both the increasing E lake and A o . Globally, lake evaporation volume (V E ) has been increasing at a rate of 2.1 ± 1.6% decade −1 (i.e., 31.2 ± 24 km 3 decade −1 ; see Supplementary Fig. 3 for uncertainty).
Compared to natural lakes, the evaporative water losses and their associated trends from reservoirs are more pronounced. According to the GLEV, the 6715 artificial reservoirs from HydroLAKES contribute 16% (235 km 3 year −1 ) to the global evaporation volume-even though they only account for 5% of the storage capacity, and 10% of the surface area, of all lakes (natural and artificial) combined. This quantity of reservoir evaporative loss is equivalent to 20% of the global annual consumptive water use (1185 km 3 ) 23 . From 1985 to 2018, evaporative water loss from reservoirs has been increasing at a rate of 5.4% decade −1 , which largely outpaces the global trend for all 1.42 million lakes (i.e., 2.1% decade −1 ).
Attributions of trends and variability of lake evaporation volume. Three factors, E lake , f d,ice , and A s , have contributed to the long-term trend of lake evaporation volume (see line plot in   The percentage contribution of each of the three factors (E lake , f d,ice , and A s ) to the interannual changes of the lake evaporation volume has shown a clear spatial pattern ( Fig. 4; see Methods). Based on the primary drivers of the interannual variability of V E ,

Ice duration
E v a p r a t e L a k e a r e a Fig. 4 Attributions of long-term trends and interannual variability of lake evaporation volume. The map shows the relative contributions from evaporation rate (E lake ), lake surface area (A s ), and lake ice duration (f d,ice ) to the interannual variability of evaporation volume (V E ). The inset line graph shows the original V E time series and the three V E time series plots calculated from the detrended A s , f d,ice , and E lake , respectively. The gray areas in the map indicate no data. The "Original" line represents the V E time series from the global lake evaporation volume (GLEV) dataset. The "A detrended " line represents the V E time series obtained by detrending the surface area time series for each lake. Similarly, the "Ice detrended " and the "E detrended " lines show the V E time series obtained after detrending the ice duration and the evaporation rate time series (respectively) for each lake.
we grouped the 1.42 million global lakes into two categories. For the first category, the V E variability of a lake is dominated by nonprecipitation forcings. For instance, an increase of global air temperature can elevate E lake 11 and reduce f d,ice 12 . There are 1.09 million lakes (77%) that belong to this category, with a combined total open water area of 1.13 × 10 6 km 2 . The climatic effect is especially notable in the high-latitude (e.g., Canada and northern Eurasia) and high-altitude (e.g., Tibetan Plateau) regions. Given the amplification effect of climate change in these regions 24 , we expect an accelerated increase of V E in the future. It is also worth noting that for humid regions (e.g., the southeastern U.S. and southeastern China), the interannual variability of V E is mainly affected by E lake changes (because lake surface areas in these regions are mostly stable 15 ). For the second category, the interannual variability of V E is mainly controlled by regional hydrologic conditions (e.g., surface runoff) and/or reservoir operations-both of which are the direct drivers for A s changes 22 . There are 0.33 million lakes that belong to this category, and they are scattered across a large portion of the global land area. For these lakes, the annual changes in A s prevail over the variabilities of E lake and/or f d,ice . For instance, arid and semi-arid regions (e.g., the western U.S., southern Africa, and Australia) are typically affected by multiyear wet-dry cycles, and thus V E is highly correlated with A s . For relatively humid regions (e.g., India), interannual variability from both precipitation and water use can substantially affect A s and then V E . It is worth noting that 63% of all reservoirs (4250 out of the 6715 reservoirs in HydroLAKES) belong to this category. This underlines the importance of incorporating area dynamics for accurately estimating evaporative water loss to improve water management.

Discussion
Our findings have significant environmental, societal, and economic implications, as the global evaporative loss will be accelerated in the future under global warming. As sentinels of climate change impact, lake systems respond quickly in terms of lake ice phenology and evaporation. Given the large increase of temperature in the high-latitude and high-altitude regions 24 , the loss of lake ice can result in greater heat uptake (as the liquid water surface has a much lower albedo than ice) and larger open surface areas to evaporate. Together, the consequent greater evaporation flux to the atmosphere can alter both air humidity and local/ regional hydrological processes, and thus needs to be more accurately described in Earth System Models. In addition, the substantial reservoir evaporation volume and its trend (5.4% decade −1 ) under a warming climate can impose a hefty strain on water resources as demand for agricultural, industrial, and domestic water continues to increase in the context of population growth.
To the best of our knowledge, the GLEV is the very first dataset to provide long-term monthly time series evaportion values for 1.42 million individual natural lakes and reservoirs worldwide. With area values from Landsat high-resolution satellite observations and evaporation rates from a validated physically-based model for each lake, the dataset is globally consistent and locally practical. This freely available dataset can be beneficial to the wider science community and decision-makers. For example, this dataset can complement accurate water availability estimation, which especially needs to be prioritized during droughts 25 . Information about the transfer of water vapor from lakes into the atmosphere can improve simulations of moisture transport and recycling 26 . While several pioneering global studies have focused on surface water extent 14 , lake and river ice 12,27 , lake surface temperature 10 , and surface water storage 4 , our analysis contributes to the growing body of knowledge about global water bodies-and shows how lake systems are responding to the ongoing climate change.

Methods
Lake masks. Shapefiles for the global lakes (natural and artificial) were obtained from the HydroLAKES dataset 28 , which contains 1,427,688 water bodies that are 0.1 km 2 or larger. In this study, we included all of these water bodies except for the Caspian Sea. Among these lakes, HydroLAKES identified 6715 reservoirs based on the Global Reservoir and Dam database (GRanD) 29 . To include all possible historical water coverage scenarios, we buffered the HydroLAKES dataset based on lake area (Supplementary Table 2).
Meteorological data. To account for the uncertainties associated with the meteorological data, three representative datasets were used (Table 1): TerraClimate 30 , ERA5 31 , and the Global Land Data Assimilation System (GLDAS) 32 . These three widely used datasets were developed independently by different institutes/agencies based on a wide range of different input sources, and thus can represent the meteorological forcing conditions as well as their uncertainties. Monthly data values of the four governing variables for our evaporation rate algorithm-surface downward shortwave radiation, surface air temperature, surface humidity, and wind speed-were collected from these three datasets. Specifically, because surface downward shortwave (solar) radiation is one of the most important energy terms for the lake energy balance 8,33 , we have independently validated its values using ground-based measurements from the Global Energy Balance Archive (GEBA) 34 . The Evaporation volume. The monthly time series of evaporation volume from Jan 1985 to Dec 2018 was calculated for each lake. For a given lake, its monthly evaporation volume was estimated using Eq. 1.
where V E is the evaporation volume (m 3 d −1 ); E lake is the evaporation rate for each month (mm d −1 ); A s is the monthly surface area (km 2 ), and f d,ice is the monthly fraction of ice duration (which is defined as the time percentage of a month when a lake is fully covered by ice). In particular, the monthly E lake time series was calculated based on a newly developed algorithm 8 using monthly meteorological data, the monthly A s time series was estimated using a Landsat-based global surface water dataset (GSWD) 14 , and the monthly f d,ice time series was modeled using reanalysis air temperature and freeze/thaw lags. The calculation of each of these variables is detailed in the following sections.
where E lake is the lake evaporation rate (mm d −1 ); Δ is the slope of the saturation vapor pressure curve (kPa°C −1 ); R n is the net radiation (MJ m −2 d −1 ); δU is the heat storage change of the water body (MJ m −2 d −1 ); a and b are the wind function coefficients (and are equal to 2.33 and 1.65, respectively) 35 ; u 2 is the screen height (2 m) wind speed (m s −1 ); L f is the average fetch length of the water body (m); e s is the saturated vapor pressure at air temperature (kPa); e a is the air vapor pressure (kPa); λ is the latent heat of vaporization (MJ kg −1 ); γ is the psychrometric constant (kPa°C −1 ); ρ w is the density of water (kg m −3 ); c w is the specific heat of water (MJ kg −1°C−1 ); h is the average water depth (m); T w is the water column temperature at the current time step (°C); T w0 is the water column temperature at the previous time step (°C); and Δt is the time step (set as 30 days in this study). By incorporating the "generally applicable" wind function from McJannet et al. 35  h) for the E lake calculation. For each lake, the forcing data were averaged over the lake surface area from the three meteorological forcing datasets (TerraClimate, ERA5, and GLDAS). The monthly lake fetch values 8 were calculated using (1) National Centers for Environmental Prediction (NCEP) wind direction climatology data 36 , (2) the shapefiles from the HydroLAKES dataset 28 , and (3) the lake area dynamics (as explained in the following section). For each lake, the actual epilimnion thicknesswhich determines the δU-was derived using the smaller value between the calculated potential epilimnion thickness based on surface area 37 and the average lake depth from the HydroLAKES dataset 28 .
This newly developed algorithm has been intensively validated in the U.S. region in our previous work 8 . Additional validations were implemented on a global scale by adding four international lakes that have reliable evaporation observations ( Supplementary Fig. 5). These results show that the incorporation of heat storage simulation can significantly improve the accuracy-especially for deep lakes (which have larger heat storage capacities than shallow lakes). Furthermore, we have systematically evaluated the energy balance and aerodynamic terms using observations at Lake Taihu 38,39 (Supplementary Fig. 6) and Lake Mead 40 ( Supplementary Fig. 7). Our evaluations for both shallow and deep lakes indicate that our algorithm is robust, and that the good agreement of evaporation rate simulation is not caused by the cancellation of errors. Also, our results are consistent with those of other modeling/observational studies focused on several extremely large lakes (Supplementary Table 3). Generally, the evaporation estimation for extremely large lakes (>10,000 km 2 ) is hindered by (1) scaling limitations associated with observational methods (e.g., eddy covariance), (2) ignoring heat storage changes, and (3) the inaccuracy of meteorological data from lake surfaces. Thus, we only compared the long-term average evaporation values for these large lakes. In addition, comparisons of lake evaporation rate with actual and potential evapotranspiration over land ( Supplementary Fig. 8) show a high degree of spatial consistency. The patterns of the evaporation rate zonal mean (i.e., latitudinal distribution) are also similar to the results from Wang et al. 11 .
Lake surface area dynamics. The monthly water area time series for the 1.42 million lakes from 1985 to 2018 was reconstructed based on a combination of the dynamic Landsat-based global surface water dataset (GSWD) 14 and static HydroLAKES shapefiles 28 . The GSWD is a global "water body" product 41 that was derived from Landsat imagery via an expert system that considers numerous factors (e.g., cloud shadow, terrain shadow, lava). However, remotely sensed "water body" does not represent the "lake" water extent. Thus, we further used Hydro-LAKES shapefiles as the outer boundaries of lakes to ensure that extracted water pixels from GSWD are from lakes. This type of approach has been commonly adopted for lake area time series estimations [42][43][44] . GSWD contains multiple products with different temporal resolutions: monthly, annual, monthly climatology, and static. One caveat of the GSWD monthly product is that its global water maps contain large areas of "no data" pixels 14 , which result from multiple factors, including limited Landsat coverage (especially before 1999), cloud, cloud shadow, terrain shadow, and the Scan Line Corrector failure of Landsat-7. Therefore, direct extraction of the area time series from these monthly maps can lead to severe underestimations.
Previously, we developed a robust image enhancement algorithm to reduce the impacts of the "no data" pixels on water area extraction when using the GSWD monthly product 15 . However, this algorithm requires extensive computational resources and is not practical for generating area time series for 1.42 million lakes. Therefore, in this study, we updated the algorithm to (1) consistently reduce the impacts of image contamination (e.g., cloud and shadow) on the extracted monthly water area; (2) improve the computational efficiency so as to be practical at the million-lake processing level; and (3) better characterize the seasonality in datalimited years. The updated algorithm was developed using the GSWD annual product with seasonality reconstructed using long-term monthly mean area values. Based on the original monthly product, GSWD also provides the annual composite images, which classify each pixel for each given year into one of four categories: (1) year-round water, (2) seasonal water, (3) not water, and (4) no data. These annual water classification maps have significantly eliminated the number of "no data" pixels-but meanwhile, also conceal the monthly dynamics. Building upon these annual maps, we reconstructed the monthly area dynamics for each year (Fig. 5) based on the long-term monthly mean area values from the original monthly global maps (Eqs. 4 and 5).
where A R (y, m) is the reconstructed area for year y and month m (km 2 ); A y, ya is the year-around area for year y; A y, ss is the seasonal area for year y; and W m (ranging from 0 to 1) is the weighting factor for month m (ranging from 1 to 12). A m is the average area for month m (which is calculated based on the original monthly global maps from 1985 to 2018; km 2 ); min i¼1 12 A i is the minimum monthly average area (km 2 ); and max i¼1 12 A i is the maximum monthly average area (km 2 ). Specifically, the monthly average is calculated by stacking all of the historical clear water pixels together to get the water occurrence image for the given month, and then using it as a weighted image to calculate the total area. When a specific year had no A y, ya and/or no A y, ss due to limited Landsat coverage during the early years (especially before 1999, Supplementary Fig. 9), these values were linearly interpolated using valid values from adjacent years. Although these linearly interpolated values may not represent the true water areas with perfect accuracy, they were derived from the best available data from Landsat. Furthermore, we can expect that the missing values will have a limited impact on the 34-year trend at a large scale. This is because the missing coverage of Landsat in the early years is mainly located in the high-latitude regions ( Supplementary Fig. 9), where the area changes are primarily affected by ice coverage (simulated by air temperature that is described in the following section). The validation of this simple but computationally practical (for 1.42 million lakes) method was implemented by comparison to area values extracted from the original cloud-free monthly maps from the Global Reservoir Surface Area Dataset (GRSAD) 15 . The overall coefficient of determination (R 2 ) for the 6715 reservoirs involved is 0.97 (Fig. 6). This method performs better for relatively larger lakes. For example, the R 2 -value for the lakes between 1 km 2 to 100 km 2 is 0.99, while it is 0.17 for the lakes <0.01 km 2 . Given that the majority of the lakes in the HydroLAKES dataset (96%, Supplementary Table 2) have area values >0.1 km 2 , the proposed method is thus believed to be capable of providing reliable reconstructed monthly area values. In addition, we calculated the error statistics for each of the 6715 reservoirs (Fig. 6b, c). The median relative bias (rBias) of 1.6% and median relative Root Mean Square Error (rRMSE) of 9.2% further indicate the high quality of the reconstructed monthly area values.
The reconstructed monthly area values were further compared with in-situ observed storage/elevation time series values ( Supplementary Fig. 10) and satellite altimetry data wherever available ( Supplementary Fig. 11). Compared to GRSAD, the new results perform better during years with fewer Landsat observations (e.g., 1992 to 1998 for Mossoul Lake, Iraq). Overall, the reconstructed areas yield comparable data quality as those of the GRSAD monthly area values, and are capable of representing the area dynamics for lakes with a wide range of sizes.
Lake ice duration. The ice coverage of a lake depends on its climatic, geographic, morphological, and bathymetric characteristics 17,18 . In particular, air temperature plays a dominant role in lake ice phenology 45 . However, a simple 0°C isothermal approach cannot accurately represent the freeze and thaw events of lakes. This is because the energy stored in lake water before winter delays the formation of ice, and the ice thickness build-up during the winter slows down the rate of ice melting 18 . Specifically, the freeze lag (i.e., time lag between a 0°C frost day and lake freeze-up) is primarily affected by the internal heat of the lake, which is correlated to the lake's depth. With regard to thaw lag (i.e., the time lag between a 0°C warm day and lake ice break-up day), it is mainly determined by the ice thickness which depends on the average winter temperature.
Based on the long-term lake ice observation records from the Global Lake and River Ice Phenology Database [GLRIPD] 46 , we established (1) the relationship between the freeze lag and the average lake depth; and (2) the relationship between the thaw lag and the average winter (December, January, and February) temperature (Fig. 7). The air temperature data were calculated from the TerraClimate monthly dataset, and then were linearly interpolated to a daily time step to estimate the 0°C isotherm dates. Due to the large variability in actual daily air temperature data, reconstructed daily temperatures based on monthly time series can provide more reliable 0°C isotherm dates 47 . The average lake depth data were collected from the HydroLAKES dataset 28 . To derive more representative relationships for estimating freeze lag and thaw lag, we (1) calculated the freeze/ thaw lag days for each observed freeze/thaw date at each lake; (2) averaged the freeze and thaw lag days for each lake; and (3) fitted the equations using these data for all of the lakes. An exponential relationship was found to best fit the freeze lag data, and a linear relationship was found to best fit the thaw lag data (Fig. 7).
First, these two fitted equations (Fig. 7b, c) were evaluated using simulated annual ice duration for lakes that have observations for both ice-on and ice-off dates (Fig. 7d). For the 76 such lakes, the R 2 -value is 0.93 and the standard deviation of the absolute bias is 9.2 days year −1 . Considering the low E lake during the freezing and thawing periods, we expect that less uncertainty will be propagated to the V E . Second, the monthly simulation performance of these two fitted equations (Fig. 7b, c) was evaluated using long-term (≥ 20 years) in-situ ice phenology data for 14 North American lakes ( Supplementary Fig. 12). The average bias for these lakes is 7 days year −1 (with a range from −19 to 33 days year −1 ), indicating satisfactory data quality. It is worth noting that our quantification of ice duration simplified the lake ice phenology by assuming binary values (i.e., ice or no ice) for each day. For large lakes, knowing the true areal fraction of ice coverage can be important. However, by aggregating the daily binary values to monthly floating percentage values, the impacts of such simplification can be reduced. For example, validation for the North American Great Lakes (Supplementary Fig. 13) shows a high level of agreement between simulated annual ice fraction (averaged f d,ice ) and observed annual ice fraction (derived from daily measurements of ice coverage).
We did not use remotely sensed ice cover (e.g., MODIS) to represent the ice phenology for two reasons: (1) The tradeoff between spatial and temporal resolution significantly limits the wide application of satellite ice observations, especially for small lakes; (2) Cloud cover-and reflectance similarity between    Fig. 14). To ensure the data homogeneity, reanalysis air temperature is commonly used to smooth the ice cover time series from satellite data. However, such a method then relies on the data quality of reanalysis air temperature, leading to a similar performance with our modeling approach 48 . Nonetheless, satellite sensors (e.g., Landsat and MODIS) are still highly promising for constructing continuous lake ice phenology, especially by fusing multiple sensors and machine learning algorithms.
Trend and variability attribution. Trend attribution of the increasing lake evaporation volume was implemented by detrending each of these three factors (i.e., E lake , f d,ice , and A s ) 49,50 . For instance, to calculate the global V E time series with the area trend removed, we (1) detrended the A s time series for each lake while keeping the E lake and f d,ice time series unchanged, (2) multiplied these three terms together to get the new V E time series for each lake, and (3) aggregated the results to obtain the global V E time series. The contributions of each factor (E lake , f d,ice , and A s ) were then calculated as the long-term difference between the original V E time series and the detrended ones. The relative contributions were subsequently calculated as the percentage values of the individual actual contributions over the sum of the actual contributions.
For each lake, the interannual variability of V E is affected by the variability of three components: E lake , f d,ice , and A s . To evaluate the relative importance of these three contributing factors, we adopted the following approach: First, we calculated the R 2 -values between the annual V E and each of these three factors (i.e., E lake , f d,ice , and A s ). Then, we aggregated the R 2 results (using the average lake areas as weighting factors) within each equal-area grid for V E -E lake , V E -f d,ice , and V E -A s . Next, the three R 2 -values were normalized using a sum-of-unity to show the global distribution of the relative importance of these three factors (Fig. 4). We then grouped the lakes into two categories based on the major drivers of the V E variability: (1) non-precipitation forcings; and (2) terrestrial hydrologic conditions and reservoir operations. A lake was assigned to the first category when the R 2value for V E -E lake or V E -f d,ice was larger than that for V E -A s . Conversely, when the R 2 -values for V E -E lake and V E -f d,ice were both smaller than that for V E -A s , the lake was assigned to the second category.
Sources of uncertainty and algorithm caveats. Uncertainty in the V E estimate is propagated from that in the E lake , f d,ice , and A s . For E lake , the input forcing generally carries large levels of uncertainty 51 . This uncertainty was quantified using three independent reanalysis datasets-TerraClimate, ERA5, and GLDAS-and is described as the standard deviation of the relative bias (SDRB) for the evaporation rate (E lake ) of each lake (by regarding average values as the truth values). This calculation leads to an uncertainty value of 7.22%. With respect to ice duration (f d,ice ), its uncertainty is mainly inherited from the air temperature uncertainty (Fig. 7a) and the regression uncertainty (Fig. 7d). The air temperature uncertainty of the reanalysis datasets originates from multiple sources, including the data assimilation algorithm and various sources related to assimilated data (i.e., satellites and ground stations). Here, using the same method as described above, we found that the air temperature uncertainty (from the three reanalysis datasets) can lead to an uncertainty of 1.45% for f d,ice . For the regression component, the associated uncertainty can be calculated using the SDRB presented in Fig. 7d, resulting in a value of 2.52%. Similarly, the surface area (A s ) has an uncertainty of 6.1% based on the information presented in Fig. 6. In total, by assuming a normal distribution for these four types of uncertainty and running a million cycles of a Monte Carlo simulation, the total uncertainty was estimated to be 9.93%, yielding an uncertainty range for global V E of about 150 km 3 year −1 .
There are a few caveats that are worth noting. A full energy balance under ice conditions is not explicitly simulated when calculating E lake . The interaction between radiation, ice, and water makes lake heat storage complicated, which then extends to the evaporation process 19 . Due to the small E lake during the winter months, we expect a limited impact on the overall V E and its long-term trend. However, this warrants the need for collecting more E lake observational data during months when the lakes are partially covered by ice for model evaluation. In addition, sublimation from lake ice can also contribute to lake water loss during the winter season (Supplementary Note 1). However, due to the insulation effect of the overlying snow cover 52 and the relatively small sublimation rate from ice/ snow 53,54 , we would expect only a limited effect for this process on the overall lake water loss (Supplementary Fig. 15). With regard to lake surface area, the limited Landsat global coverage during its early years affects the data quality of the GSWD. This, in turn, affects our area estimations. However, due to the usage of the reconstruction algorithm ( Eqs. 3 and 4), such impacts-especially on long-term trends-can be reduced. Observed ice duration (days year -1 ) Fig. 7 Simulation of freeze lag and thaw lag days. a A conceptual scheme to simulate the ice duration. b The fitted relationship between freeze lag days and lake average depth. c The fitted relationship between thaw lag days and average winter temperature. d Comparison between simulated and observed annual ice duration, with a root-mean-square error (RMSE) of 9.2 days year −1 .