Introduction

In North America, mountains comprise <25% of the land area yet account for 60% of the continent’s snowpack13. Scientists and water managers estimate regional snow water equivalent (SWE) throughout the winter and especially at the end of winter to predict intra-annual water resources, including streamflow and groundwater recharge14. There are no direct satellite observations of SWE, and indirect observations are either sparse in space and time15 or used minimally in numerical simulations and reanalyses16,17. In the Western United States (WUS), the estimation of this quantity in practice relies heavily, or even exclusively, on in situ data from the more than 900 (804 excluding Alaska) automated snow pillows from the snow telemetry (SNOTEL) network and >100 additional state-based snow course networks8. These site-based measurements are both rich in information content regarding snow depth and density and have historically been informative as empirical predictors of catchment-wide SWE16, especially the end-of-winter peak of SWE that sets the initial conditions of spring snowmelt18.

In situ, calibrated, and validated measurements of climate-sensitive observables are generally more informative about Earth system changes with longer data records19,20,21. However, this implies stationarity in the spatial and temporal distributions over which those measurements sample22. SNOTEL stations, most of which were installed in the 1980s, were intended to monitor the seasonal patterns of snow quantities and their interannual variability within a 20th-Century mountainous hydroclimate, not to monitor the change in that same hydroclimate. Moving the network is logistically impractical10, so the anthropogenic warming-induced snow change in the 21st century will fundamentally reduce the sensitivity of the SNOTEL network’s ability to predict drought at a management-relevant scale12. Punctuated sensitivity degradation is likely to occur during periods of maximized aridity characterized by dry and warm spells; it is exactly these times when water security across the region will be most strained23,24.

In this work, we estimate the magnitude of WUS SNOTEL and California Cooperative Snow Survey (CCSS) snow pillow network sensitivity degradation in a climate-changed world and explore new approaches to interpreting the information available in observational networks. We do so by analyzing high-resolution WUS climate projections to determine when and where novel snow patterns may emerge in the 21st century and then how in situ and remote sensing systems are sensitive to such patterns. We use the Western United States Dynamically Downscaled Dataset (WUS-D3)25, a nine-member ensemble of global climate models (GCMs) from the 6th phase of the Coupled Model Intercomparison Project (CMIP6)26, which is dynamically downscaled with the Weather Research and Forecasting (WRF) model27. This projection ensemble allows for high-resolution evaluation of snowpack outputs that evolve in time according to model physics. Figure 1 shows the WUS-D3 solution domain with locations of current WUS snow pillow sites. Insets show historical (1980–2014) and Shared Socioeconomic Pathway (SSP) 3-7.0 (2015–2100) ensemble-mean annual maximum SWE, annual cumulative precipitation, and positive degree days for an example subregion with complex terrain in the Rocky Mountains. SSP 3-7.0 is a high but plausible emissions trajectory covering 2015–210028. This figure shows that snowpack has already declined and the temperature has already increased in the WUS, as seen by both downscaled models and reanalysis29,30,31,32. Snowpack declines and temperature increases are expected to continue, albeit unevenly, across the WUS.

Fig. 1: Snow pillow locations and downscaled climate projections.
figure 1

a The locations of WUS snow pillow stations (black crosses) and HUC2 watersheds (dark blue lines). b Upper Colorado River Basin 1981–2014 average peak SWE. c Same as (b) but for winter season cumulative precipitation. d Same as (b) but for winter season positive degree days. e Annual- and GCM-average time series of peak SWE (solid lines) for the entire WUS domain as defined in (a) (black), snowy regions (red, >100 mm average peak SWE from 1980 to 2000 to conservatively include ephemeral and seasonal snowpacks) and at the locations of SNOTEL sites (blue) with the WUS-SR product (dotted lines)17. The standard deviation of the 9 GCMs is shaded for each respective area. f Same as (e) but for winter cumulative precipitation. g Same as (e) but for annual positive degree days.

Results

Snow and snow pillow sensitivity to climate change

The ability of the snow pillow networks to represent unobserved snowpack will be increasingly impacted by changing snow, precipitation, and temperature. Figure 2 shows how those impacts are expected to manifest.

Fig. 2: Snow pattern shift in future climates.
figure 2

a The year at which half of snow pillow peak SWE values are expected to be <10% of their historical average. b The same metric as (a) but plotted as a function of elevation and latitude. The distribution of snow pillow stations (gray bars) and land (brown dashed line) with respect to elevation opposite the elevational distribution of peak SWE for 1980–1990 (blue), 2050–2060 (green), and 2090–2100 (pink), for the c Pacific Northwest, d California, e Great Basin, f Missouri, and g Upper Colorado basins. hl SWE pattern repeatability for basins in (cg) for each individual GCM (blue lines, mean in black).

Figure 2a indicates the expected year at which the 918 snow pillow stations in the WUS (excluding Alaska) will exhibit substantial degradation in peak SWE sensitivity. The metric of degradation is when the peak SWE at that station is <10% of its 1980–2015 average for more than 5 of the previous 10 years, which is based on an existing percentile-based low-to-no-snow definition2 but better-suited to locations with low interannual variability within the baseline period. Figure 2b shows that the lower elevations (below 2000 masl) will become increasingly snow-free by mid-century, while the higher elevations will become snow-free later in the century. Figure 2c–h shows the rising snowline elevation at several basins across the West in three decades: 1980–90, 2050–60, and 2090–2100.

Finally, Fig. 2h–l highlights the long-term decline in snow pattern repeatability (shown in black and quantified with average R2 between current-year normalized SWE and the previous 20 years of the same, see the “Methods” section for details) from 2000 to 2100, punctuated by years with far less pattern repeatability than the historical average. Across the ensemble of simulations, future years are predicted to contain different within-basin snow patterns relative to the recent past; in other words, the interannual variability in spatial SWE patterns is predicted to increase. Furthermore, Fig. 2h–l shows that the high level of peak SWE interannual pattern repeatability33 found in the historical record34,35 is expected to decline, but at different rates in each region.

Climate-resilient peak snow estimation

We conduct experiments to determine the forcing data and model structures that will maintain SWE estimation skills in future climates. These experiments entail establishing a range of input data sets and statistical models for distributed SWE estimation and applying them to the WUS-D3. We evaluate the relative impact of snow pillow observations, ancillary datasets, and comprehensive snow survey observations (e.g., ref. 36) on peak SWE estimation skills as used as inputs to linear regression, random forest, and U-Net convolutional neural network models.

We simulate three levels of forcing data from observations that can be used to estimate peak SWE across a basin: (1) snow pillow peak SWE observations alone, (2) snow pillow peak SWE observations augmented with gridded meteorology and (3), (1) and (2) along with intensive, gridded observational estimates of SWE (such as with remotely-sensed, high-resolution lidar data36). These inputs are simulated in historical and future years by extracting the relevant fields from the WUS-D3 dataset corresponding to those observational levels, creating observational drivers, and adding synthetic error to those drivers (see the “Methods” section).

We estimate pixel-wise peak SWE with three data modeling procedures in order of increasing complexity: (1) linear regression, (2) random forests37, and (3) U-Nets38. Traditionally, operational SWE and runoff forecasting in snow-dominated watersheds has relied on the first of these procedures: empirical development of linear relationships between snow measured at local (i.e., in the same watershed) snow pillow stations and distributed SWE39. We, therefore, use linear regression to replicate the assumptions of status quo approaches. The implementation of random forests then allows for explicit non-linear relationships between input data and distributed peak SWE. Lastly, U-Nets preserve spatial correlations within the outputs in addition to capturing nonlinearity.

In reality, we expect neither observational information nor data models to be static as snow volumes and patterns change. Therefore, we trained all models on the previous 20 years of data (see the “Methods” section) to reflect current water management and drought definition practices9,40. Figures 3 and 4 show the coefficient of determination (R2) between the WUS-D3 peak SWE value and the modeled estimate of that value for each experimental configuration (i.e., forcing data and model choice).

Fig. 3: Basin-wide model performance across configurations.
figure 3

Time-averaged (1980–2100) basin-wide coefficient of determination (R2) between WUS-D3 peak SWE and the peak SWE predicted from each experimental configuration for each snowy HUC6 basin in the Western United States. Row indicates estimation model as linear regression (ac), random forest (dh), or U-Net (gi); columns indicate input data as snow pillow (a, d, g), +gridded (b, e, h), or +intensive (c, f, i). Color indicates R2 value.

Fig. 4: Distributed peak snowpack performance for all models and forcing data combinations.
figure 4

a Average pixel-wise coefficient of determination (R2) between predicted and WUS-D3 peak SWE for each input data level and estimation model combination tested: linear regression (solid), random forest (dashed), and U-Net (dotted); forcing data from snow pillow locations (orange), +gridded (pink), +intensive (blue); dots indicate medians, lines indicate interquartile range. bd Shows quantities in (a) as they change over time. eg shows the distribution of R2 values for the linear regression (x-axis) and U-Net (y-axis) in time, from 2001 (blue) to 2096 (yellow), for each data forcing category.

Figure 3 shows that more observational information and/or higher data model complexity yields higher R2 values. Consequently, the inclusion of more observations (e.g., gridded meteorology and intensive monitoring) and higher complexity models (e.g., random forest or U-Net) enables more accurate estimates of the spatial patterns of peak SWE throughout the 21st century. This figure also shows that while it will be challenging to estimate peak SWE in some catchments with low and/or intermittent snow cover (e.g., the Lower Colorado River Basin and Nevada), estimates of peak SWE can greatly benefit from additional observations and higher-complexity data models. These additional observations and complex models are able to inject information about new spatial relationships between measured and non-measured locations and the response of old spatial relationships to a new climate.

Figure 4 further explores the importance of observations and model complexity for peak SWE prediction across the WUS, highlighting that all predictions decrease in skill into the future as year-to-year variability in snow increases. However, the skill of the U-Net is far greater than the skill of linear regression when forced with snow pillow SWE alone or with gridded meteorology (Fig. 4b–d). The accordion plots in Fig. 4a show high skill in predictions that use a U-Net or at least gridded meteorology. The U-Net and gridded meteorology both explicitly include two-dimensional spatial patterns, while the snow pillow locations, random forest, and linear regression do not. With gridded meteorology, the added value from higher complexity models is smaller than with linear regression but still present. Intensive observations make complex modeling unnecessary or even counterproductive. Figure 4b–d shows time sequences of R2 for each data-model combination, indicating that the U-Net consistently achieves higher skill. Even in future water years, the U-Net model retains R2 generally above 0.75.

Figure 4e–g shows performance increases from implementing U-Nets in place of linear regression. The lower performance in the historical year (yellow) compared to the end-of-century year (blue) is consistent across forcing data categories, but the decline is lower for U-Nets (y-axis) than for linear regression (x-axis). The salient features of the random forest and U-Net are their ability to model nonlinear relationships and that the U-Net explicitly encodes spatial correlations while linear regression and random-forest models are point-specific. These findings suggest that encoding spatial correlations will become increasingly important for peak SWE estimation. While historical performance has less inter-model and inter-basin variance, the snow pillow and gridded cases show consistent improvements from U-Net implementation. Because each model is trained on the previous 20 years of data, the decline in performance toward the end of the century in all data and model cases indicates that snow distributions are changing faster than the spatial models can adapt.

Discussion

The ability of observational networks such as SNOTEL to inform estimates of distributed SWE fields in future climates is likely to decrease. As the climate warms, the snow line elevation will rise while some colder regions will receive increased snowfall, resulting in novel spatial distributions of snow. There are substantial water resource management implications to this reduced capability41: retaining or releasing water in ways that do not reflect the true volume of snowpack can lead to misallocation. To that end, intraseasonal management choices rely on awareness of peak SWE and its spatial distribution42,43,44. A warming climate will also shift the within-season temporal distribution of SWE; the methods evaluated here could be applied at any time during the snow season in order to optimally inform management.

We find that the challenges that existing WUS SWE estimation capabilities will face in the 21st century can be partially offset by expanding the observations used to estimate SWE, even if they only loosely constrain SWE, or by adopting data-driven, spatially aware statistical models. In this framework, each year’s predictions depend on patterns learned from the previous 20 years. Declining predictive skill near end-of-century therefore reflects an acceleration of change. Even in a year with a new extreme climate, any observation in the current year can lend insight to the potentially novel pattern. However, sparse observations (e.g., snow pillows) will require nimble, 2D methods (e.g., U-Net), while dense observations (e.g., gridded meteorology) can make do with more straightforward relationships between observation and output.

We provide baseline quantification of the performance gains possible from each effort. Implementing U-Nets in the snow pillow-only data scenario improves skill more than adding gridded meteorological data to a linear regression model but less than adding intensive SWE observations. However, the U-Net models implemented here approach the performance of intensive airborne SWE measurements even using just snow pillow station measurements; adding gridded meteorological data slightly improves mean performance and decreases inter-basin variance. Implementing U-Nets in the intensive case decreases performance, likely due to overfitting.

It is important to recognize the ecological and management challenges that future extreme low water years (e.g., shown in Fig. 4e–g) will create. The importance of accurate estimation for management will be acutely greater than it has been in the past. The data-driven models and multiple modalities of datasets for SWE prediction can be quickly developed, tested, and deployed using mature software workflows. In the middle of winter in ahistorical years, rapid SWE prediction system implementation may be warranted or required. Although not explored here, we also hypothesize that similar methods could be used for intra-seasonal temporal (rather than spatial) predictions, further increasing the utility of intensive measurement campaigns.

The snow-targeting remote sensing of the future will also contribute to the solution of this problem. Achieving climate change-resilient snowpack estimation should incorporate better and more widespread observations45,46 and update data-driven statistical modeling47. Importantly, most remote sensing relies on in situ measurements for calibration and validation. Water management is inherently local, and as such catchment-specific methods combining remote sensing, observations, and physics-based modeling (e.g., ref. 48) will continue to provide valuable insights. The methods developed here, however, leverage only existing measurements and, therefore, can be broadly and immediately applied.

Lastly, the finding that more complex models are needed to maintain SWE prediction skills does not necessarily equate to a requirement to use an artificial intelligence (AI) method, such as U-Net. Conventional approaches, including observation-based reanalysis estimates, are no less relevant to peak SWE estimation than AI. Indeed, they are driven by the same observational information that would drive AI models and contain parameterized representations of causally related atmospheric and surface processes. At the same time, the community of AI practitioners is both large and nimble. In other words, AI models can be rapidly developed to emulate atmospheric and surface processes while also managing multiple observational modalities49. While the results of this work are not necessarily prescriptive for requiring the use of AI for WUS peak SWE estimation in the future, they are prescriptive for implementing nimble, multi-modal predictors of distributed SWE. AI algorithms, with their current capabilities, represent one such solution that can provide timely SWE information for water resource managers, scientists, and the large populations that rely on snowpack.

Methods

Downscaled regional climate simulations

In this work, we use WUS-D325, a nine-member ensemble of CMIP6 GCMs, dynamically downscaled with WRf27. Each raw GCM was accessed at its native resolution, and its temperature, humidity, horizontal winds, and geopotential fields were bias-corrected50. Supplementary Table 1 lists each GCM name and variant used in this ensemble. These bias-corrected fields were then used to drive WRF simulations on a 45-km outer domain grid. One-way nesting was then used to downscale the 45-km results to a 9-km grid length inner domain over the Western United States The SWE field is obtained from directly coupling Noah-MP to WRF following methods in refs. 25,29. For this work, we use 9-km data, as the domain of that product covers the entire WUS land region. While this grid cell size can obscure orographic influences on snowpack, the snow product in the historical simulations of WRF with the bias correction described above showed skill in being able to reproduce observed diagnostic fields29, including exhibiting skill in largely reproducing the SWE values found in historical regional snow reanalyses15, as shown in Fig. 1. The skill of WUS-D3 SWE is further demonstrated in the snow pillow site-specific climatology shown in Supplementary Fig. 1. We justify this bias correction because the biases in the corrected fields are endemic to CMIP6 GCMs and arise from unrealistic circulation biases in the northeastern Pacific Ocean. These errors, relative to the European Centre for Medium-Range Weather Forecasting Reanalysis product—version 5 (ERA5)51, are carried forward in future projections and must be corrected to capture realistic WUS hydroclimate.

We create a historical downscaled dataset covering 1980–2014 by downscaling the historical experiment of the suite of CMIP6 models analyzed26 and we create a dataset that extends from 2015 to 2100 with CMIP6 models that are forced with the Shared Socioeconomic Pathway (SSP) 3-7.0 emissions scenario28. Here, the SSP3-7.0 simulations provide one scenario through which to explore the nature of future change in snowpack-related observations.

For this work, we use the ‘snow’ (snow water equivalent in mm), ‘prec’ (total precipitation in mm d−1), and ‘t2’ (2-m air temperature in K) outputs of WUS-D3. The peak SWE value at a given pixel is simply the annual maximum at that pixel, which approximately represents the total snow water storage accumulated in a particular pixel. April 1 has been a historically useful proxy for peak SWE and has been used in water supply forecasting, but we use peak SWE here because the regional and temporal drift between true peak SWE and April 1 is increasing and will continue to increase52.

The precipitation field was used by summing the precipitation of any phase that occurred at that pixel for that water year prior to the peak SWE at that pixel. Similarly, positive degree-day (PDD) values at a given pixel were used as a driver. We calculated this term by summing the positive-only difference between the 2-m air temperature at that pixel and 0 °C for the same time period. The calendaring convention for this analysis is based on the WUS water year that begins on October 1 and is named for the calendar year of the following year (e.g., the water year 1981 is from October 1, 1980 to September 30, 1981 inclusive).

Currently available snow data products

The data combinations evaluated in this work were constructed to reflect actual snow-related data products. We accessed daily SNOTEL records from the Natural Resources Conservation Service (NRCS) SNOTEL data repository8 and other snow pillows from CCSS. For comparisons between these data and the downscaled historical simulations, we truncated the snow pillow record to the time period that overlapped between the two datasets: 1980-2014. We accessed the WUS-SR17 SWE data post-assimilation records from the National Snow and Ice Data Center (NSIDC) and selected pixel-wise maximum SWE for each water year in the time domain 1985–2014 for comparison with modeled products in Fig. 1. We also used SNOTEL and WUS-SR data to evaluate the ability of the 9-km downscaled data to represent the performance of the SNOTEL network. Other work has conducted similar comparisons with positive outcomes, including for statistically downscaled, coarser-grid products (e.g., ref. 12). Supplementary Fig. 1 shows daily SWE climatologies for each of the 9-km downscaled, bias-corrected WUS-D3 simulations at pixels containing at least one SNOTEL station from 1985 to 2014, compared to the same time frame with the SNOTEL record and the same pixels in WUS-SR.

Peak SWE in many watersheds of the WUS is constrained by multiple snow pillow stations, weather analyses, and satellite data15,17,53. The most comprehensive observations of catchment-scale SWE currently are sub-orbital: aerial lidar measurements such as the Airborne Snow Observatories36 (ASO), or stereo-photogrammetric reconstructions of snow height at the end of winter54 yield spatially-resolved, cloud-contamination-free SWE products because those measurements are a powerful constraint on SWE. The uncertainty in peak SWE from these measurements is driven by uncertainty in snow density55, the difficulty of measuring snow under canopies56, and off-peak observation timing. Therefore, in situ datasets, particularly from snow pillows, form the backbone of peak SWE estimation in most regions, as reflected in the snow pillow data case; sub-orbital observations are modeled in the intensive data case.

Spatially and temporally resolved gridded precipitation fields, derived from a wide range of observational datasets including ground-based gauges and radars, satellite-based radars and radiometers, and numerical weather simulation57, represent another key dataset with predictive information on peak SWE in the mountains, as reflected by the gridded data case. Surface energy budgets strongly influence SWE, and measures of temperature, such as degree-days, provide practical information on snowpack processes that would negatively influence peak SWE, including snowmelt58. There are also numerous satellite datasets that are relevant to, but only loosely constrain peak SWE in high-altitude complex terrain. For example, the areal fraction of snow is readily measured from multi-spectral radiometry on satellite-based platforms such as the MODerate resolution Imaging Spectroradiometer (MODIS) and Visible Infrared Imaging Radiometer Suite (VIIRS), but those products do not observe snow depth and density.

Spatial pattern repeatability

The snow pattern repeatability34 is defined as the average R2 between the nondimensionalized peak SWE field S; where \(S=\frac{s-{\mu }}{{\sigma }}\) and s is the SWE value in a given grid box, μ is the mean for that time, and σ is the standard deviation of that time step35; and each of the 20 previous years of the same. The 20-year moving window gives each analysis year that we analyze a wide range of interannual variability in SWE while not being directly impacted by a nonstationary climate. The metric S has been applied to higher-resolution snow data in ref. 34 to demonstrate a high degree of pattern repeatability between years. This method of calculating pattern repeatability treats pixels as independent points without a 2-D structure because points in a basin are flattened for correlation analysis.

Perfect data experiments

The findings presented here, including those in Figs. 3 and 4, are derived from perfect data experiments (sometimes referred to as observing system simulation experiments)59. In this framework, a number of different data models attempt to predict the underlying peak SWE value produced by one of the WUS-D3 simulations at a given time at a given location using predictors based on the atmospheric and surface state from that same WUS-D3 simulation. These data models range in algorithmic complexity and are only given incomplete information about the peak SWE of the WUS-D3 simulation that they are trying to predict. Consequently, synthetic, but realistic errors are added to the gridded precipitation, temperature, fSCA, and SWE predictors. Additionally, a range of different variables (see the section “Data groups for peak SWE prediction”) and algorithms (see the section “Models for peak SWE prediction”) were used to predict peak SWE. Finally, the construction of the perfect data models was designed to ensure that the data models only used a subset of the predictor variables in each WUS-D3 simulation that reasonably corresponds to the data that will be available to make a prediction for peak SWE (e.g., the prediction of any data model at a given time would not be aware of future predictor or predictand values). These observing system simulations are idealized but reveal the challenges, opportunities, and relationships between predictors and a predictand associated with using different observations and data processing algorithms and are used to assess the climate resiliency of peak SWE estimation approaches.

Synthetic gridded data

There are several synthetic, gridded data fields that are included in these perfect data experiments, including the following. The data field for the fractional snow-covered area (fSCA) is estimated from the distributed SWE field reported in WUS-D3 in order to correspond to what an optical spectroradiometric satellite observes (e.g., Landsat or MODIS Normalized Difference Snow Index60,61). By imaging the same location approximately daily, the satellite instrument datasets skillfully measure snow fraction in and around peak SWE in spite of cloud contamination. For a given day, if a pixel has SWE > 40 mm, then it is assigned an fSCA value of 1, otherwise 062. To account for cloud contamination, the fSCA values are corrupted by randomly masking out 1/3 of the days between the start of the water year (October 1) and the date of peak SWE. For days that are considered cloudy, the value of fSCA is determined by considering the most recent non-cloudy day. The daily fSCA values are then averaged for the entire snow season to yield a synthetic data field of fSCA corresponding to peak SWE.

The PDD and cumulative precipitation fields are constructed from modeled values of those same quantities, with error and bias added as calculated between the WUS-D3 historical outputs and products from Parameter-elevation Regressions on Independent Slopes Model (PRISM)63 followed by a spatial gradient smoothing step (see Supplementary Fig. 2).

Lastly, the distributed April 1 SWE in the intensive data modeling case was constructed to simulate a potential SWE observation either by a future SWE satellite or a modeled product from a sub-orbital aerial system (e.g., ASO). Due to operational constraints that limit the windows when aerial data are collected, the exact timing of sub-orbital observations relative to the peak SWE timing is not guaranteed and cannot be retroactively adjusted to align with the true date of peak SWE. To account for this variance, we randomly assigned a date of synthetic sub-orbital observation by sampling from a normal distribution centered on April 1 of the water year and a standard deviation of seven days.

The synthetic future gridded temperature and precipitation datasets were developed using a statistical mapping process between the WUS-D3 outputs and historical PRISM data for the same HUC6 basin. For the 1980–2020 time period, we aggregated PRISM data for precipitation and temperature over the WUS domain and re-gridded the 4-km product to the 9-km WUS-D3 grid. For the cumulative precipitation and PDD fields, we then created a pixel-wise bias and error comparison between the PRISM data and the simulated data. We then produce synthetic gridded fields for the analysis presented in Figs. 3 and 4 by adding the same bias and error statistics we diagnosed in the historical WUS-D3 simulations to the 2015-2100 WUS-D3 simulation. The synthetic error is produced by randomly sampling from a distribution that exhibits the same bias and error statistics as the WUS-D3 historical output relative to PRISM. Because each pixel’s future error was randomly sampled independently from all other pixels, we then applied a spatial smoothing algorithm64, as implemented in Python’s skimage module65. This smoothing algorithm captures the spatial correlation structure of WUS-D3 errors relative to PRISM. Supplementary Fig. 2a and c show the synthetic PRISM data for temperature and PDD, respectively, before the spatial smoothing. Supplementary Fig. 2b and d show the same quantities after spatial smoothing. The smoothing algorithm does not change the overall distribution of the absolute quantities across the region, but rather creates a more realistic synthetic PRISM dataset with spatial distributions that represent what an actual PRISM dataset may look like in a future year. Due to the spatial domain of the PRISM data set, we truncated basins that cross the US–Canada border to only include pixels within the US, as it was impossible to estimate the bias and error needed to generate synthetic PRISM PDD and cumulative temperature fields in Canada.

Data groups for peak SWE prediction

These synthetic observational datasets were then used to train a series of different distributed SWE prediction data models. We established several dataset scenarios whereby different combinations of observational datasets are used as predictors of peak SWE, noting that these synthetic variables could very plausibly be used as predictors of peak SWE because they are all currently operationally available. The five predictors dataset groups are as follows:

  1. 1.

    Peak SWE at the pixel containing a snow pillow site, latitude, longitude

  2. 2.

    Data in (1) with added elevation, slope, and aspect

  3. 3.

    Data in (2) with added synthetic fractional snow-covered area (fSCA) at each pixel

  4. 4.

    Data in (3) with added synthetic PRISM cumulative precipitation, cumulative snowfall, PDD, and mean seasonal temperature

  5. 5.

    Data in (4) with added synthetic sub-orbital lidar-based SWE at April 1 ± 10 days

These dataset groups were further grouped into three categories. Groups 1 and 2 (“snow pillow”) represent a version of current practices in which topography and snow pillow data are the main drivers of distributed SWE estimation. Groups 3 and 4 (“gridded”) represent the addition of relevant, approximately real-time remote sensing (3) and/or reanalysis (4) data but which only loosely constrain SWE estimates. These datasets, while not generally currently used in most basins to support SWE estimation, are realistically available for the entire WUS without any local expenditures or targeted measurement campaigns. Furthermore, a recent study66 showed that these datasets (scenarios 3 and 4) are drivers that tightly constrain the physical processes governing snow accumulation and melt in the Upper Colorado River Basin. We accumulated these variables from October 1 (i.e., the start of the water year) through April 1 (of the same water year). Lastly, dataset group 5 (“intensive”) represents a heavily observed basin with significant targeted measurements. These measurements tend to be in highly researched basins with funding available to pay for extensive campaigns67. For example, the Tuolumne River Basin in California and the East River Watershed in Colorado had near-April 1 ASO measurements in 2023 and thus have highly constrained SWE estimates47.

Models for peak SWE prediction

We established the following space of data models to explore the impact of increasing model complexity for predicting spatially distributed SWE:

  1. 1.

    Linear regression

  2. 2.

    Random forest

  3. 3.

    U-Net

Linear regression methods represent the simplest data model and seek to predict snowpack as a linear function of a set of input variables that are connected to and predictive of the processes that control peak SWE in a given area. Random Forest methods37 add a layer of complexity to model predictions by enabling a non-linear mapping between input(s) and peak SWE. They consist of an ensemble of regression trees, which seek to minimize the model loss by recursively partitioning the input into smaller subspaces. Each regression tree picks a random subset of data points and a random subset of input features for training. The final model output is obtained by taking an average (or ensemble) prediction of all the regression trees. We implemented random forests using Python’s scikit-learn module68 and used the hyperparameter values recommended by the developers of this method (https://CRAN.R-project.org/package=randomForest). We used 500 trees, considered p/3 features when looking for the best node split (where p is the number of input features), and specified a node size of 5. Random forests have been shown to work well with these hyperparameter values for predicting snowpack66.

Finally, we also considered U-Nets38, which adds another layer of complexity to model predictions by imposing spatial constraints on the non-linear mapping between snowpack and input. Spatial constraints imply that pixels that are close to each other will be more correlated than pixels that are further away from each other. U-Nets achieve this by virtue of being fully convolutional neural networks. They also use an encoder-decoder architecture to ensure that only the most relevant information is considered while learning a functional mapping. U-Nets also implement skip connections between the encoding and decoding paths to ensure that there is no information loss during the encoding-decoding process. This results in the model learning a precise mapping between its inputs and its outputs. We implemented U-Nets using the tensorflow module in python69. We used an architecture similar to the original architecture with modifications as shown in Supplementary Fig. 4. Since the input for a given year was a single 2-d image (with multiple channels corresponding to the number of input features), we used a bounding box around each HUC6 basin. The encoder (left side) consisted of a repeated application of two 3 × 3 convolutions (each padded with zeros and followed by a rectified linear unit) and a 2 × 2 max pooling operation to contract the input. Each downsampling step involved doubling the number of feature channels. We did three contractive iterations, which meant the bounding box for each basin needed to be expanded such that its dimensions were a multiple of 8. Additional contractions were not considered due to computational expense. The decoder (right side) consisted of a 3 × 3 transposed convolution that upsampled the inputs by a factor of 2 and halved the number of feature channels. This was concatenated (via a skip connection) with the corresponding hidden layer from the encoder path and then subjected to two 3 × 3 convolutions (as described for the encoder path). The above steps were repeated until the original height and width of the input were recovered. Following this approach, the output was obtained by subjecting the decoder output to a 1 × 1 convolution. The first encoder layer consisted of 64 channels, which resulted in a U-Net model with ~8.6 million parameters. We then clipped the output to the actual basin boundaries.

Other machine learning models have been successfully implemented for distributed SWE prediction70,71; the choice of U-Nets for this work is motivated by the portability and 2-D nature of U-Nets, in addition to their relatively straightforward implementation. All the models were trained on the Perlmutter supercomputer CPU provided by the National Energy Research Scientific Computing Center (NERSC). For the entire study area, the linear regression models took about 20 min to train on a single core. The random forests models took 90 min to train using 128 cores. The U-Net models took 12 h to train using 2048 cores.