Tracking lake drainage events and drained lake basin vegetation dynamics across the Arctic

Widespread lake drainage can lead to large-scale drying in Arctic lake-rich areas, affecting hydrology, ecosystems and permafrost carbon dynamics. To date, the spatio-temporal distribution, driving factors, and post-drainage dynamics of lake drainage events across the Arctic remain unclear. Using satellite remote sensing and surface water products, we identify over 35,000 (~0.6% of all lakes) lake drainage events in the northern permafrost zone between 1984 and 2020, with approximately half being relatively understudied non-thermokarst lakes. Smaller, thermokarst, and discontinuous permafrost area lakes are more susceptible to drainage compared to their larger, non-thermokarst, and continuous permafrost area counterparts. Over time, discontinuous permafrost areas contribute more drained lakes annually than continuous permafrost areas. Following drainage, vegetation rapidly colonizes drained lake basins, with thermokarst drained lake basins showing significantly higher vegetation growth rates and greenness levels than their non-thermokarst counterparts. Under warming, drained lake basins are likely to become more prevalent and serve as greening hotspots, playing an important role in shaping Arctic ecosystems.

The PDF file includes:

Fig. S3.
Distribution of lake drainage events in the Canadian region.The numbered regions marked on the map are: 1. Koudjouak Plain; 2. Hudson Bay Lowlands; 3. Cambridge Bay Lowlands; 4. Banks Island; 5. Canadian Western Arctic coastal/Taiga plains; 6. Peace River Bank.

Fig. S4.
Ecoregion delineation in the circum-Arctic permafrost zone. 1 Please refer to Table S1 for specific ecoregion information corresponding to the numbers.

Fig. S6.
Proportion of small, medium, and large drained lakes across various categories of permafrost extent, thermokarst lake likelihood, and ground ice content.C: continuous, D: discontinuous, S: sporadic, I: isolated.Refer to Fig. S5 for classification distribution patterns.(c) Lakewise density for drained lakes of different sizes.Lowlands cover 29.6% of the northern permafrost region, but contribute to 57.1% of drained lakes.Lake-wise density for drained lakes of all sizes in lowland areas exceed the regional average.The lake is Ozero Maloye Yeravnoye Lake, located in the discontinuous permafrost zone in southern Russia, with an area of about 6,000 ha.This shallow lake, with an average depth of only 1.8 meters, experienced drainage between 2014 and 2017.During the winter of 2013, there was an abnormally high amount of snowfall in the lake area, setting a record for the highest snowfall in nearly 20 years.An abnormally thick snowpack would not only hinder the refreezing of the active layer at the lake bottom, leading to talik development and potential drainage channel formation, but would also increase snowmelt, intensifying the possibility of bank overtopping.Evaluation of (a) mean annual air temperature, (b) mean summer air temperature, (c) annual precipitation, and (d) summer precipitation.The analysis is based on a total of 16,104 drained lakes detected in North America, using climate data for the year of drainage for comparison.As Daymet dataset does not provide mean air temperature, a simple estimation was made using the average of maximum and minimum air temperature.Bland-Altman plots visually display the average difference and variability between the two datasets.The x-axis represents the average values of the two datasets, and the y-axis shows the differences.The solid red line represents the mean difference, while the dashed red line indicates the 95% limits of agreement.Scatter density is shown in different colours, with yellow representing areas of high concentration of sample points.Overall, ERA5-Land and Daymet reanalysis data exhibit consistency in simulating temperature and precipitation, with relatively low variability.

Table S1.
Basic information about the ecoregions.The locations of the ecoregions are shown in Fig. S4.

Table S2.
Ecologically-based statistics on lake drainage occurrences and post-drainage vegetation dynamics.NDVI values are derived from the 10th year after drainage.The term "Ratio" refers to the ratio of NDVI in DLBs to the surrounding NDVI.

Fig. S7 .
Fig. S7.Diagnostic assessment of the lake drainage prediction model.(a) Receiver operating characteristic (ROC) curve and (b) precision-recall (PR) curve plots for the accuracy assessment of the machine learning binary classification model used in predicting lake drainage events.The blue dashed lines represent a random guess.Recall: ability to find actual positives; Precision: accuracy of positive predictions; AUC: area under the curve; AP: average precision.A high AUC value indicates that the model has a strong ability to discriminate between positive (drained lakes) and negative (undrained lakes) samples, while a high AP value signifies high predictive quality for positive samples.

Fig. S8 .
Fig. S8.Drained lakes in lowland areas.(a) Distribution map of drained lakes, with areas below 150 meters in elevation shown in orange.(b) Area distribution of drained lakes in lowland areas.(c)Lakewise density for drained lakes of different sizes.Lowlands cover 29.6% of the northern permafrost region, but contribute to 57.1% of drained lakes.Lake-wise density for drained lakes of all sizes in lowland areas exceed the regional average.

Fig. S9 .
Fig. S9.Differences in vegetation greenness between DLBs and surrounding areas.TCG measured in the tenth year after lake drainage for various classifications of (a) thermokarst lake likelihoods, (b) lake sizes, (c) drainage ratios, (d) permafrost extents, (e) regions, (f) floodplain status, (g) soil carbon contents, (h) soil nitrogen contents, and (i) Yedoma region.Boxplots show the statisticshorizontal lines: median; dots: mean; boxes: interquartile range; whiskers: 1.5 times the interquartile range.Sample sizes are indicated below each plot.(j) Time series of changes in relative greenness of very likely, likely and unlikely thermokarst DLBs, represented by TCG differences compared to surrounding vegetation.Solid lines show median values, while shaded areas indicate upper and lower quartile ranges.

Fig. S10 .
Fig. S10.Diagnostic assessment of the prediction model for NDVI changes in post-drainage DLBs.(a) Scatter plot comparing predicted and actual values and (b) residual plot of the machine learning regression model for predicting post-drainage NDVI in DLBs.Scatter point density is color-coded, with a red dashed line representing the ideal scenario.RMSE: root mean square error; MAE: mean absolute error; MB: mean bias.

Fig. S11 .
Fig. S11.Example of gradual drainage of a giant lake.(a-e) Landsat color-infrared images (NIR-R-G) captured at 111°39′ E, 52°40′ N between 2013 and 2017.(f) Year of lake drainage detected by the LandTrendr algorithm on a pixel-by-pixel basis.The lake is Ozero Maloye Yeravnoye Lake, located in the discontinuous permafrost zone in southern Russia, with an area of about 6,000 ha.This shallow lake, with an average depth of only 1.8 meters, experienced drainage between 2014 and 2017.During the winter of 2013, there was an abnormally high amount of snowfall in the lake area, setting a record for the highest snowfall in nearly 20 years.An abnormally thick snowpack would not only hinder the refreezing of the active layer at the lake bottom, leading to talik development and potential drainage channel formation, but would also increase snowmelt, intensifying the possibility of bank overtopping.

Fig. S12 .
Fig. S12.Grid-based statistics of drained lakes.(a) Distribution map of drained lakes based on a 0.1°×0.1°grid.(b) Pie chart statistics of grid counts with drained lakes.A total of 35,337 drained lakes are distributed across 20,132 grid cells, with the majority of cells containing 1-2 drained lakes.More than 99% of grid cells have fewer than 10 drained lakes.

Fig. S13 .
Fig. S13.Bland-Altman plots for assessing the consistency of ERA5-Land and Daymet reanalysis data.Evaluation of (a) mean annual air temperature, (b) mean summer air temperature, (c) annual precipitation, and (d) summer precipitation.The analysis is based on a total of 16,104 drained lakes detected in North America, using climate data for the year of drainage for comparison.As Daymet dataset does not provide mean air temperature, a simple estimation was made using the average of maximum and minimum air temperature.Bland-Altman plots visually display the average difference and variability between the two datasets.The x-axis represents the average values of the two datasets, and the y-axis shows the differences.The solid red line represents the mean difference, while the dashed red line indicates the 95% limits of agreement.Scatter density is shown in different colours, with yellow representing areas of high concentration of sample points.Overall, ERA5-Land and Daymet reanalysis data exhibit consistency in simulating temperature and precipitation, with relatively low variability.

Table S3 .
Candidate explanatory variables for the binary prediction model of lake drainage events.

Table S4 .
Candidate explanatory variables for the regression prediction model of NDVI in DLBs.

Table S5 .
Control parameters of the Landtrendr algorithm.

Table S6 .
Optimal model hyperparameters for the CatBoost binary classification model used to predict lake drainage events.

Table S7 .
Optimal model hyperparameters for the CatBoost regression model used to predict NDVI in DLBs.