Summer warming explains widespread but not uniform greening in the Arctic tundra biome

Arctic warming can influence tundra ecosystem function with consequences for climate feedbacks, wildlife and human communities. Yet ecological change across the Arctic tundra biome remains poorly quantified due to field measurement limitations and reliance on coarse-resolution satellite data. Here, we assess decadal changes in Arctic tundra greenness using time series from the 30 m resolution Landsat satellites. From 1985 to 2016 tundra greenness increased (greening) at ~37.3% of sampling sites and decreased (browning) at ~4.7% of sampling sites. Greening occurred most often at warm sampling sites with increased summer air temperature, soil temperature, and soil moisture, while browning occurred most often at cold sampling sites that cooled and dried. Tundra greenness was positively correlated with graminoid, shrub, and ecosystem productivity measured at field sites. Our results support the hypothesis that summer warming stimulated plant productivity across much, but not all, of the Arctic tundra biome during recent decades.


Supplementary Figures
Supplementary Figure 1 | Differences in NDVI among Landsat sensor before and after cross-sensor calibration. Landsat 7 NDVI compared with raw and cross-calibrated (a) Landsat 5 NDVI and (b) Landsat 8 NDVI. Note that raw Landsat 5 NDVI was consistently lower than Landsat 7 NDVI, which was consistently lower than Landsat 8 NDVI (left columns). This can introduce an artificial positive trend in composite NDVI time series. This issue was obviated by further cross-sensor calibration using Random Forest machine learning algorithms (right columns). Each data point is an estimate of 15-day median NDVI computed from observations acquired during the years of overlap between pairs of sensors at each sampling site. Each sampling site contributes a single data point with the 15-day period selected at random from available periods during summers with at least five observations. The diagonal orange lines show 1:1 relationships. Estimates of NDVImax increase asymptotically with scene availability when derived from raw (uncorrected) Landsat observations; however, estimates of NDVImax exhibit minimal change with scene availability when Landsat observations are corrected using site-specific information on land surface phenology. Intra-box lines denote median percent error among site x years that went into the analysis, while boxes encompass 50% of observations, and whiskers extend 1.5 times the interquartile range. Trend metrics include (a) the relative change in mean Arctic NDVImax (%) and (b) the percentage of sites with a positive ("greening") or negative ("browning") trend in NDVImax (α = 0.10). Solid lines depict median estimates from 10 3 Monte Carlo simulations while error bands depict 95% confidence intervals (CI). Changes in the width of the 95% CIs are shown in panels (c) and (d). Each simulation not only used random subsets of sites, but also NDVImax time series generated with randomly permuted surface reflectance, crosssensor calibration models, phenological-correction parameters.

Supplementary Figure 6 | Correlations between annual Landsat NDVImax [unitless] and the summer warmth index [SWI; °C] across the Arctic during recent decades. Mean
Spearman's correlation (rs) between annual NDVImax and SWI among sites within each 50 x 50 km grid cell. Each grid cell shows the mean Spearman's correlation (rs) between annual NDVImax and SWI among sites. Specifically, annual NDVImax was correlated with either current-year and two-year average SWI, and then reassessed after linearly detrending both NDVImax and SWI time series. Note that annual NDVImax was derived by averaging the annual time series from sites within each grid cell (50 x 50 km resolution). Each grid cell depicts the median correlation coefficient derived from 10 3 Monte Carlo simulations that randomly permuted both NDVImax and SWI timeseries. Figure 7 | Location of field sites that provide metrics of plant productivity that were compared with Landsat NDVImax. Field data sets include graminoid productivity (a), shrub ring-width (b), and ecosystem gross primary productivity estimated from measurements made by eddy covariance flux towers (c). The size of each shrub ring-width plotting symbol is proportional to the Spearman correlation (rs) between Landsat NDVImax and the ring-width index chronology at that location.

Supplementary Figure 8 | Graminoid productivity and Landsat NDVImax on Bylot Island.
Time series (a) and scatter plot (b) of annual median graminoid aboveground net primary productivity [ANPP] and Landsat NDVImax [unitless] from 1990 to 2017 at long-term monitoring sites on Bylot Island in northern Canada. Lines (a) and points (b) depict medians and error bands (a) and bars (b) depict 95% confidence intervals derived from 10 3 Monte Carlo simulations. There were typically 12 quadrats harvested per year, but 11 quadrats in 1991, 2013, 2014, and 2016. Quadrats were harvested at four subsites over this period. Annual median Landsat NDVImax was computed using data from these four subsites. Error bars represent 95% confidence intervals (CI) derived from 10 3 Monte Carlo simulations. The Spearman correlation (rs) also includes a 95% CI. Supplementary Table 12  Trend characteristics include the total relative change in mean Arctic NDVImax and the percentage of sites with a positive ("greening"), negative ("browning"), or no trend in NDVImax (α = 0.10). Each trend metric represents the median estimate from 10 3 Monte Carlo simulations and is accompanied by a 95% confidence interval (CI). Note that each median trend metric is quite stable across a 400-fold range in sample size and that the width of each 95% CI asymptotically decreases.

Supplementary Table 11 | Shrub sampling site locations, chronology inter-series correlations (rbar), sample sizes, and Spearman's correlations (rs) between Landsat
NDVImax and each shrub ring-width index chronology. Sample size includes the number of years of overlap between NDVI and shrub ring-width measurements (N Years) and the number of shrubs (N shrubs) that went into each chronology during these years. The number of shrubs in each chronology varied through time and thus the mean, min, and max sample sizes are provided. The rbar and rs include 95% confidence intervals derived from 10 3 Monte Carlo simulations.

Landsat data sets, processing, and analyses
To characterize annual tundra greenness, we developed annual estimates of maximum summer normalized difference vegetation index (NDVImax) from 1985 to 2016 for sampling sites in Arctic tundra 33 using 30 m resolution measurements of surface reflectance from the Landsat satellites (Landsat Collection 1) 34,35 . Estimates of annual Landsat NDVImax are sensitive to multiple sources of uncertainty, including sensor calibration (± 3 to 7%) 36,37 , systematic differences in NDVI among sensors 38,39 , and the availability of summer measurements. We developed new approaches to cross-calibrate NDVI among sensors and model annual NDVImax using summer measurements in conjunction site-specific information on seasonal land surface phenology. Moreover, we ascertained how uncertainty in estimates of annual NDVImax time series affected subsequent aspects of the analysis using Monte Carlo simulations (n = 10 3 ). The following sections provide details regarding the Landsat data sets and processing. Landsat sampling We extracted Landsat surface reflectance measurements for 50,000 terrestrial sampling sites spread randomly across the Arctic tundra. We included both Polar Arctic and Oro Arctic tundra 33 and partitioned terrestrial from aquatic areas using the Joint Research Center Global Surface Water dataset 42 . We buffered each sampling site by 50 m radius, yielding an approximate 3x3 pixel window around each site. For each pixel within a buffer we then extracted all Landsat 5, 7, and 8 surface reflectance measurements that were acquired June through August (Julian Days 152 -243) between 1984 and 2016. This yielded 507 million multi-band measurements of land surface reflectance from these sampling sites. All together, we sampled 0.005% of the domain at an average density of one sampling site per 155 km 2 of land area. On average, sites had a nearest neighbor that was 7.0 km away (minimum = 47 m and maximum = 314 km). The adequacy of this sample size is justified below in the Landsat NDVImax trend analysis section.
Landsat quality control We took multiple steps to ensure that only high-quality clear-sky measurements were included in our analysis. First, we exclude observations (i.e., a pixel at a point in time) from scenes with high cloud cover (> 80%), spatial uncertainty (> 30 m), or solar zenith angle (> 60°). Second, we masked out observations that were identified as cloud, cloud shadows, water, or snow by the C Function of Mask (CFmask) algorithm 43,44 . Third, we minimized potential errors associated with radiometric saturation, atmospheric correction, or residual water by excluding observations with unrealistically high (> 1) or very low (< 0.005) surface reflectance. Fourth, we excluded observations that fell within the data gaps caused by failure of the Landsat 7 scan line corrector.
Overall, we filtered out 72% of observations due to these issues.
Cross-calibrating NDVI among Landsat sensors There are systematic differences in NDVI among Landsat 5, 7, and 8 (Supplementary Figure 1) and these differences must be addressed before assessing temporal trends in NDVI 38,39,45 . Failure to address these differences can introduced artificial positive trends into NDVI timeseries that are based on measurements from multiple Landsat sensors. Linear models have been developed to cross-calibrate Landsat 5 and 7 NDVI in boreal North America 38, 45 and Landsat 7 and 8 NDVI for the conterminous USA 39 , but we are unaware of existing models for crosscalibrating Landsat 5, 7, and 8 in Arctic tundra. Following prior studies, we initially tried crosscalibrating NDVI among sensors using linear regression, but observed a nonlinear relationship between Landsat 7 and 8. We therefore developed a machine learning approach to cross-calibrate NDVI from Landsat 5 and 8 with Landsat 7, which is a useful benchmark since it temporally overlaps with the other sensors. We cross-calibrated the sensors by first identifying the years when both Landsat 7 and Landsat 5/8 collected imagery at a sampling site. Second, we pooled NDVI measurements across those years and then computed 15-day moving median NDVI over the course of the growing season for each sensor and sampling site. Third, we excluded 15-day periods with fewer than 5 measurements from both sets of sensors and then randomly selected one remaining 15-day period from each sampling site. We then used 2/3 rds of these data to train Random Forest models 46 that predicted Landsat 7 NDVI based on Landsat 5/8 NDVI. The models also account for potential seasonal and regional differences between sensors by including as covariates the midpoint of each 15-day period (day of year) and the spatial coordinates of each sampling site. We fit the random forest models using the ranger package 47 in R 48 . As part of the Monte Carlo uncertainty analysis, we fit a separate random forest model to each of the 10 3 simulations.
In addition to the out-of-bag error assessment performed internally by the Random Forest, we also cross-validated each model using the remaining 1/3 rd of the data that was withheld from training. This holdout cross-validation involved predicting NDVI using the trained Random Forest model and then linearly regressing observed versus predicted NDVI. The models for calibrating Landsat 5 and 8 had high predictive capacity (r 2 ≈ 0.97) and both low root mean squared error and bias ( Supplementary Figure 1). The performance of these models was similar to or exceeded that of cross-calibration models developed for other regions 39,45,49 . We therefore applied these models to cross-calibrate NDVI measurements at the full set of sampling sites.

Supplementary Tables
Modeling maximum summer NDVI using Landsat We sought to infer changes in tundra greenness using estimates of maximum summer NDVI (NDVImax) derived from the Landsat satellites. It is challenging to reliably estimate annual NDVImax using Landsat since these estimates are sensitive to the number of cloud-and snow-free observations ("useable observations") acquired each summer. The annual number of useable summer observations increased from 1984 to 2016 at sites in the Arctic (Supplementary Figure  2). There were typically few useable summer observations at each site during the 1980s and 1990s, though observations became increasingly available during the 2000s following the launches of Landsat 7 and 8. Estimates of annual NDVImax typically increase asymptotically with the number of useable summer observations since this increases the likelihood that observations will have been acquired during peak summer greenness (Supplementary Figure 3). In other words, NDVImax is systematically underestimated when few observations are available. Consequently, the increase in useable summer observations introduces a spurious positive trend into NDVImax time series and this must be addressed before assessing long-term trends in NDVImax. The irregular timing of image acquisitions and overall low image availability make it challenging to compute not only NDVImax, but also time-averaged or integrated NDVI 50 . We therefore developed an approach to more reliably estimate NDVImax when few summer observations were available.
We estimated annual Landsat NDVImax at sampling sites by combining annual summer observations with information on land surface phenology. Land surface phenology can be characterized based on seasonal changes in NDVI. The NDVI typically increases in spring as snow melts and plants begin leafing-out, reaches a maximum in the middle of summer following full plant community canopy development, and then declines in fall as leaves senesce and are shed 51 . For each site we quantified land surface phenology from spring through fall by predicting daily NDVI using flexible cubic splines fit to all quality-controlled Landsat observations. To account for potential shifts in land surface phenology during recent decades 52, 53 , we fit an individual cubic spline to observations from each 17-year period between 1985 and 2016 (Supplementary Figure 4). The Landsat record is limited in much of the Arctic prior to the 2000s, thus using a 17-year window allowed us to pool measurements across this era of sparse observations when estimating annual NDVImax. As part of quality control, we iteratively removed observations with NDVI that differed by >100% from the corresponding daily prediction. We also excluded sites from 17-year periods if there were fewer than 30 useable observations. We fit the cubic splines using the smooth.spline function in R 48 and characterized uncertainty in model fit by randomly varying the smoothing parameter (spar =0.68 -0.72) and available observations as part of the Monte Carlo analysis. We interpret each cubic spline as representing the typical land surface phenology of the site 54 during the corresponding 17-year period.
Landsat observations acquired during summer may not exactly coincide with the timing of peak summer greenness (NDVImax); however, it is possible to estimate annual NDVImax by combining these summer observations with site-specific information on land surface phenology (example in Supplementary Figure 4b). The phenological curves enabled us to determine the typical difference in NDVI between peak summer greenness and the day that each summer observation was acquired (ΔNDVIDOY, vertical blue lines in Supplementary Figure 4b). The ΔNDVIDOY reflects the additional increase in NDVI that we would expect had a Landsat observation occurred during peak summer greenness instead of slightly earlier or later in the growing season. We then estimated annual NDVImax using each summer observation: where NDVIDOY is an observation of NDVI from a specific day during the summer. If multiple Landsat observation were available from a single summer, then we estimated overall NDVImax for that summer by computing the median of NDVImax predicted from each summer observation. While we focused on estimating inter-annual variability in NDVImax, our approach is akin to one presented by Melass and colleagues 54, 55 who examined inter-annual variability in the start and end of the growing season in deciduous forest of eastern North America. We evaluated how estimates of annual NDVImax changed with the availability of growing season scenes using both phenologically-corrected and raw (uncorrected) Landsat observations. Here we define the 'growing season' for each site using the phenological curves to identify the seasonal period when daily NDVI was typically within 75% of NDVImax. We first selected site x years with at least 11 Landsat scenes acquired during a growing season. We then calculated observed NDVImax for each site x year; however, to guard against observations with spuriously high NDVI 51 we excluded the 10% of observations with the highest NDVI before computing NDVImax. Next, we repeatedly subsampled between one and ten Landsat observations from these site x years and for each subsample computed how phenologically-corrected and raw estimates of NDVImax differed from observed NDVImax. This allowed us to quantify how the percent error between observed and estimated NDVImax changed with scene availability both with and without our phenological correction (Supplementary Figure 3). This assessment showed that raw estimates of NDVImax increase asymptotically until there are at least seven Landsat scenes acquired during a summer, after which estimates of NDVImax change little with increasing scene availability. On the other hand, our phenologically-corrected estimates of NDVImax change little with increasing scene availability, though the uncertainty of the estimates decreases with increasing scene availability. Relative to raw data, the phenological correction slightly increased the spread of estimates when more than five scenes were available from a growing season, which occurred in about 9% of all site x years. These comparisons highlight that (1) estimates of annual NDVImax are sensitive to the number of available scenes and that (2) our phenological correction can provide less biased estimates of annual NDVImax when few Landsat scenes are available from a growing season.

Landsat NDVImax trend analysis
We assessed NDVImax trends during recent decades using Landsat observations from across the Arctic. Specifically, we evaluated NDVImax trends at individual sampling sites and after averaging NDVImax time series among sites within tundra bioclimatic zones and across the whole Arctic. Furthermore, we focused on NDVImax trends during two nominal periods (1985 to 2016 and 2000 to 2016) that were chosen based on (1) the availability of Landsat imagery in the Arctic and (2) interest in assessing both long-term and near-term trends. We excluded sampling sites that were barren (mean NDVImax < 0.10, n = 4,112 sites) or had short measurement records (< 10 years, n = 582 sites). We then evaluated each NDVImax time series for the presence of a monotonic trend using a rank-based Mann-Kendall trend test 56,57 and determined the slope of each time series using a non-parametric Theil-Sen slope estimator 58 . The Theil-Sen slope estimator and Mann-Kendall trend test were sequentially implemented by the zyp.yuepilon function from the zyp package 59 in R 48 . Temporal autocorrelation can inflate Mann-Kendall trend test statistics and increase the likelihood of detecting a trend when none is present 60 and thus the zyp.yuepilon function first evaluates whether a time series exhibits temporal autocorrelation and if temporal autocorrelation is identified then the time series is pre-whiten before implementing the Mann-Kendall trend test 60 . The Theil-Sen slope estimator and Mann-Kendall trend test are less sensitive to extreme values than simple linear regression and have been used in prior studies to assess NDVI trends at high-latitudes 61,62 . The Landsat NDVImax trends are summarized for the Arctic and each bioclimatic zone in Supplementary Table 2 and  Supplementary Table 3.
To evaluate the adequacy of our sample size, we examined how estimates of two trend metrics varied as a function of sample size. Focusing on 2000 to 2016, we computed the change in mean Arctic NDVImax and the percentage of sites with positive, negative, or no trend (α = 0.10) using sample sizes ranging from 10 2 to 4x10 4 sites. Specifically, we used samples sizes from 10 2 to 10 3 sites at intervals of 10 2 sites and then from 10 3 to 4x10 4 sites at intervals of 10 3 sites (n = 49 bins total). For each Monte Carlo simulation (n = 10 3 ), we computed these trend metrics using random subsets of sites for each of the 49 sample size bins. We then computed the median and 95% confidence interval (CI) of each trend metric for every sample size bin. This analysis revealed minimal differences in trend estimates across a 400-fold range in sample size (Supplementary Table 4, Supplementary Figure 5). For instance, we estimated that the median increases in mean Arctic NDVImax was 3.51%, 3.38%, or 3.35% whether based on 10 2 , 10 4 , or 4x10 4 sample sites. Moreover, we estimated that the median percentage of sites with a positive NDVImax trend (α = 0.10; greening) was 21.00%, 21.30%, and 21.30% based on 10 2 , 10 4 , or 4x10 4 sampling sites, while the median percentage of sites with a negative NDVImax trend (browning) was 6.00% across all sizes. The width of the 95% CIs associated with these trend metrics asymptotically shrank lead to about a 0.5% change between 10 4 and 4x10 4 sampling sites (Supplementary Figure 5c,d). This analysis illustrates the sample size is adequate for drawing robust inference about recent changes in tundra greenness across the Arctic.

Air temperature data sets, processing, and analyses
We acquired and pre-processed five global gridded temperature data sets (Supplementary Table  5) and then, for the Arctic region, derived the summer warmth index (SWI) as a metric of cumulative summer heat load 63 . Three data sets provided estimates of monthly mean air temperature (Tavg; °C) and two data sets provided estimates of monthly Tavg anomaly relative to a climatological baseline  for NASA GISS and 1981-2010 for UY HadCRUT4 with UAH). Each data set was publicly available online and provided data for at least the period from 1979 to 2016. We clipped data sets to the Arctic domain 33 and projected each to Lambert Azimuthal Equal Area on a 50-km grid using bilinear interpolation. For the two monthly Tavg anomaly data sets, we estimated absolute monthly Tavg by adding a monthly climatological baseline derived from the ensemble average of the three absolute Tavg data sets. We then derived and applied a common mask that only kept grid cells with non-missing data from every data set. Next, we computed the annual SWI as the sum of monthly Tavg exceeding 0 °C. The SWI is commonly used as an indicator of cumulative heat load in the Arctic 49,63,64 and is analogous to growing-degree days, but computed using monthly rather than daily temperature data.
Each temperature dataset was constructed using different collections of climate station observations and analytical techniques 1, 2, 4, 5, 6 , thus temporal trends in SWI and correlations between SWI and NDVImax are influenced by the specific temperature data set used in the analysis. To account for uncertainty in trends and correlation stemming from the climate data sets, we generated 10 3 synthetic domain-wide rasters of SWI for each year from 1984 to 2016. For each grid cell of every synthetic raster, we assigned a value for SWI that was randomly selected from the corresponding grid cell of one of the five temperature data sets. Consequently, each synthetic raster was built using a randomized assortment of grid cell values from the five temperature data sets. We then used this collection of synthetic SWI rasters to assess temporal trends in SWI as well as correlation between SWI and NDVImax (as described below).

Trends in summer air temperatures
We assessed changes in Arctic summer temperatures using the synthetic SWI raster data sets (described above) and non-parametric trend tests in a Monte Carlo uncertainty framework. Specifically, for each of the 10 3 synthetic SWI data sets, we evaluated SWI trends from 1985 to 2016 and 2000 to 2016 using non-parametric Mann-Kendall and Theil-Sen tests as implemented by the zyp.yuepilon function from the zyp package 59 in R 48 . We assessed SWI trends for each grid cell and Landsat sampling site, as well as after averaging SWI among grid cells in each bioclimatic zone and across the Arctic domain. We report the median change across all simulations as our best estimate of each trend and a 95% confidence interval computed from the 2.5 th and 97.5 th percentiles of these simulations. Changes mean SWI for the Arctic and each bioclimatic zone are given in Supplementary Table 6.

Temporal correspondence between Landsat NDVImax and summer temperatures
We assessed the temporal correspondence between annual Landsat NDVImax and summer air temperatures (SWI) from 1985 to 2016 and 2000 to 2016 at multiple spatial scales. We evaluated the direction and strength of correspondence between NDVImax and SWI using rank-based Spearman's correlations (rs) in a Monte Carlo uncertainty framework. Specifically, we computed NDVImax -SWI correlations for individual sampling sites and after averaging mean-centered NDVImax and SWI time series among sites within tundra bioclimatic zones and across the Arctic. Tundra greenness (NDVImax) could depend on summer temperatures over multiple years so we correlated NDVImax with current and 2-year average SWI. The strength of NDVImax -SWI correlations could also be influenced by underlying trends in both time series (e.g., warming and greening) thus we derived correlations using both original and linearly detrended time series. Moreover, uncertainty in estimates of NDVImax and SWI could influence their temporal covariation. We therefore derived 10 3 simulations of every correlation, with each simulation based on randomly permutated estimates of NDVImax and SWI. We present the median rs of all simulations as our best estimate for each NDVImax -SWI correlation and report a 95% confidence interval derived from the 2.5 th and 97.5 th percentile of all rs simulations. The NDVImax -SWI correlations for each zone are summarized in Supplementary Table 7 while spatial patterns of these correlations are summarized in Supplementary Figure 6.

Comparisons among Landsat NDVImax and plant productivity measurements
We assessed the utility of Landsat NDVImax as an indicator of tundra plant productivity using field measurements from across the Arctic. We compared Landsat NDVImax against measurements of graminoid aboveground net primary productivity (ANPP; dry matter m -2 yr -1 ) and shrub ring-width indices (RWI; unitless), as well as estimates of ecosystem gross primary productivity (GPP; g C m -2 yr -1 ) derived from flux towers. We describe the data sets and specific comparisons in greater detail below, but in each case, we assessed the direction and strength of association between satellite and field measurements using rank-based Spearman's correlations (rs) evaluated in a Monte Carlo framework that incorporated uncertainty in NDVImax and field measurements. Together, these field datasets span six countries (Canada, Finland, Greenland, Russia, Sweden, and USA) and several important plant functional types in the tundra biome (Supplementary Figure 7).

Landsat NDVImax vs. graminoid productivity
We assessed the temporal correspondence between annual Landsat NDVImax and graminoid ANPP from 1990 to 2017 on Bylot Island in northern Canada (Supplementary Figure 7a) 65 . Graminoid ANPP has been measured since 1990 in a moss-covered wetland fen that is dominated by grasses and sedges (e.g., Dupontia fisheri, Carex aquatilis, Eriophorum Scheuchzeri). This long-term monitoring is part of a project focused on Arctic food chains and provides, to our knowledge, the longest annual record of plant productivity in the tundra biome 65,66,67 . The long record and spatially-extensive sampling during peak summer make these field data particularly valuable for evaluating remote sensing indicators of plant productivity 68 .
Graminoid ANPP was quantified each year by clip harvesting live graminoid aboveground biomass (AGB) from quadrats (20 x 20 cm) that were randomly positioned across several subsites in the wetland. The harvests occurred when graminoid AGB reached a maximum in mid-August (Julian day = 226 ± 2 days; mean ± SD) and thus provide an estimate of ANPP given annual turn-over of graminoid AGB 66 . There were typically 12 quadrats harvested per year (but 11 quadrats in 1991, 2013, 2014, and 2016), with six quadrats harvested at each of two subsites (n = 332 quadrats total across years). One subsite was continually measured over the 28year period; however, the second subsite was measured once in 1990, moved to a nearby location for measurements in 1991 and 1992, and then moved again in 1993 after which the location remained the same.
We examined the temporal correspondence between annual median landscape Landsat NDVImax and graminoid ANPP from 1990 to 2017 using Spearman's correlations (rs) in a Monte Carlo uncertainty framework (n = 10 3 simulations). This involved extracting and qualityscreening all Landsat summer observations from a 100 m radius area around each subsite and then averaging spectral measurements from each Landsat scene. For each simulation, we first estimated annual median NDVImax across subsites after (1) computing NDVI with randomly permuting red and NIR reflectance, (2) cross-calibrating NDVI among sensors using a randomly selected Random Forest model and (3) fitting phenological curves with randomly permuted parameters. Second, we estimated annual median ANPP using data from a random subset (90%) of quadrats each year. Lastly for each simulation, we computed the correlation between annual median NDVImax and ANPP using data from a random subset (90%) of years from 1990 to 2017. The annual median ANPP time series was temporally autocorrelated over a two-year period (rlag1 = 0.64 and rlag2 = 0.66, P < 0.05) and thus we chose to compare annual median ANPP with NDVImax from not only the concurrent year, but also averaged over the current and two prior years, as well as just the two prior years. We present the median rs as our best estimate of each correlation and a 95% confidence interval derived from the 2.5 th and 97.5 th percentiles of the 10 3 Monte Carlo simulations.
Landsat NDVImax vs. shrub growth We assessed the temporal correspondence between annual Landsat NDVImax and 22 shrub ringwidth index (RWI) chronologies from sites located in six Arctic countries (Supplementary Figure  7b). These shrub RWI chronologies are a proxy for interannual variability in shrub productivity and in some cases may co-vary with broader plant community productivity 69 . We used new and archived annual ring-width measurements from independent projects 16,17,21,22,23,70,71 , including measurements previously collated as part of the ShrubHub shrub ring database 15 . The data set included annual ring-width measurements for alder (Alnus spp.), willow (Salix spp.), and birch (Betula spp.), which are genera of tall deciduous shrubs that are widespread across much of the circumpolar Oro Arctic and Low Arctic 72 .
Sample collection and measurement protocols differed among projects, but each project ultimately generated annual shrub ring-width measurements. Sampling occurred between c. 2005 and 2017, and typically involved harvesting the largest shrubs found in a study area, though one project harvested shrubs along transects 71 . Several projects recorded the location of every stem or transect sampled in a study area 16,17 , whereas most recorded one or several general sampling locations. Shrubs were harvested near the root collar and then one or more discs was cut from the bottom of each stem. Each disc was sanded, some discs were stained, and then the width of each growth ring was measured along one or multiple radii using either a stereo-microscope and sliding stage or a digital camera and imaging software. Each ring-width series was then crossdated to assure that growth rings were ascribed the proper calendar year. Additional details regarding sample collection and measurement protocols can be found in the references cited above. Overall, we drew on 54,374 cross-dated measurements of annual ring-widths from 1,348 shrubs (17 to 297 shrubs per study area).
We constructed median shrub ring-width index (RWI) chronologies for each shrub genera at every sampling site (n = 22 chronologies). First, we minimized the effects of juvenile growth by excluding the initial five years of measurements 16 and then averaged ring-width measurements made along multiple radii of an individual shrub. Second, we removed potential age-related biological growth trends and standardized the magnitude of growth among individual shrubs. This was accomplished by fitting a flexible cubic spline to each time series and then dividing observed ring-width in year t by the ring-width predicted for year t by the spline 73 . We fit each spline using the ffcsaps function from the dendrochronological program library in R (dplR) 74 . To account for uncertainty in this process, we randomly varied spline flexibility and moving-window length as each spline was fit. Consequently, each spline removed 45-55% of the variance over a 15-25 year moving window, thus preserving high-frequency interannual fluctuations in growth while removing low-frequency (i.e., multi-decadal) growth trends. Lastly, for each simulation we generated a genus-specific shrub RWI chronology at each sampling site by computing annual median RWI using data from a random subsample (90%) of shrubs.
We also constructed annual Landsat NDVImax time series to compare against the shrub RWI chronologies. This first involved extracting and quality-screening all Landsat summer observations from a 100 m radius area around each geotagged sampling location in a study area. For each simulation, we estimated annual NDVImax for every sampling location by computing NDVI with randomly permuting red and NIR reflectance, (2) cross-calibrating NDVI among sensors using a randomly selected random forest model and (3) fitting phenological curves with randomly permuted parameters. We then detrended the annual NDVImax (NDVImax-dt) timeseries for each sampling location using flexible splines (as above) and computed annual median NDVImax-dt across sampling locations in each study area. Lastly, for each simulation we computed the Spearman correlations (rs) between annual median NDVImax-dt and each shrub RWI chronology. We present the median rs for each site as our best estimate of the relationship and a 95% confidence interval derived from the 2.5 th and 97.5 th percentiles of the 10 3 Monte Carlo simulations.
Landsat NDVImax vs. ecosystem productivity We assessed the spatial correspondence between median annual Landsat NDVImax and ecosystem GPP across 11 flux towers located in Arctic tundra of Greenland, Russia, and the USA (Supplementary Figure 7, Supplementary Table 12). Four of the flux towers were part of the Arctic Observing Network (AON) 24,25 and seven of the flux towers were part of the FLUXNET Network (FLUXNET2015 CC-BY-4.0 February 2020) 75 . The net land-atmosphere CO2 exchange (i.e., net ecosystem exchange [NEE]) was measured at each site using the eddy covariance technique, which involves coupling measurements of atmospheric CO2 concentrations and meteorological conditions from instruments mounted on towers. Both AON and FLUXNET then estimated GPP and ecosystem respiration (Reco) by partitioning NEE (NEE = GPP -Reco) using modeled relationships between Reco and nighttime temperatures 76 . We acquired annual gap-filled estimates of GPP from FLUXNET and half-hourly gap-filled estimates of GPP from AON that we aggregated to an annual time step (g C m -2 yr -1 ).
We assessed the spatial correspondence between median annual NDVImax and GPP across 11 flux tower sites using Spearman's correlations (rs) in a Monte Carlo uncertainty framework (n = 10 3 simulations). We chose to examine the relationship across rather than within sites because the annual GPP time series at each site was relatively short (mean = 7.7 years, SD = 3.7 years). Each simulation involved randomly permuting NDVImax and GPP data sets before computing the correlation between these metrics.
To propagate uncertainty in annual estimates of GPP into our analysis, we constructed distributions of annual GPP for each site x year and then randomly drew from these distributions during each simulation. The FLUXNET data included uncertainty estimates for annual GPP at each site that were provided as the 5 th , 16 th , 25 th , 50 th , 75 th , 84 th , and 95 th distribution percentiles of a bootstrap analysis that applied varying friction velocity (u*) thresholds to delineate well mixed from poorly mixed atmospheric conditions. We constructed a distribution of annual GPP for each site x year by linearly interpolating between the distribution percentiles provided with the data set. The AON data set relied on a fixed u* threshold (0.1 m s -1 ) and did not include uncertainty estimates, thus we constructed a distribution of annual GPP for each site x year by relying on uncertainty derived from the FLUXNET data. Specifically, for each FLUXNET site x year we computed the ratios of GPP at the 50 th percentile to GPP at every other percentile ('uncertainty fraction) and then computed the median uncertainty fraction for each percentile across all site x years. We then generated synthetic distributions of annual GPP for each AON site x year by multiplying annual GPP by the vector of median uncertainty fractions. Overall, this process yielded distributions of annual GPP for every site x year in both the FLUXNET and AON data sets.
To propagate uncertainty in annual estimates of NDVImax into our analysis, we generated 10 3 time series of annual NDVImax from 1985 to 2017 for each flux tower site. We first extracted and quality-screened all Landsat summer observations from a 100 m radius area around each flux tower site and then averaged spectral measurements from each Landsat scene. Next, we estimated annual NDVImax for each site by (1) computing NDVI with randomly permuting red and NIR reflectance, (2) cross-calibrating NDVI among sensors using a randomly selected random forest model and (3) fitting phenological curves with randomly permuted parameters.
For each simulation, we computed median annual NDVImax and GPP by site using data from a random subset (90%) of years and annual GPP randomly drawn from its corresponding distribution. We then computed the correlation between median annual NDVImax and GPP. We present the median rs as our best estimate of the relationship and a 95% confidence interval derived from the 2.5 th and 97.5 th percentiles of the 10 3 Monte Carlo simulations.

Relationship between Landsat NDVImax and graminoid productivity
We found that inter-annual variability in Landsat NDVImax and graminoid ANPP were positively correlated at a long-term field monitoring site in northern Canada. Furthermore, we found that annual graminoid ANPP was autocorrelated over the preceding two years and that the NDVImax -ANPP relationship was strongest if NDVImax was averaged over the preceding two years. These findings suggest that annual graminoid ANPP partially depends on conditions during previous growing seasons. The lagged relationships could reflect the importance of non-structural carbohydrates and nutrients acquired in previous years, which are temporarily stored in belowground tissues (e.g., rhizomes) and later used for biosynthesis 77 . Arctic graminoids typically have a high ratio of belowground to aboveground biomass (e.g., the ratio for Carex aquatilis reportedly ranges from 3.4 to 22.6) 78 underscoring the importance of below-ground processes in these ecosystems. Although based on observations from one study area, the positive correlation between Landsat NDVImax and graminoid ANPP provides support for interpreting Landsat NDVImax as a proxy for aspects of tundra plant productivity associated with graminoids.

Relationship between Landsat NDVImax and ecosystem productivity
We found that median annual Landsat NDVImax was related to broad spatial patterns of median annual tundra ecosystem GPP. Our remote sensing analysis builds on prior studies that showed positive associations between hand-held measurements of NDVI and short-term, chamber-based estimates of tundra ecosystem GPP 79,80 . The relationship between NDVI and GPP likely arises because NDVI is a proxy for leaf area and chlorophyll content which together influence canopy light absorption and subsequent GPP 80,81 . Overall, these results provide support for interpreting Landsat NDVImax as a proxy for GPP in tundra ecosystems.

Relationship between Landsat NDVImax and shrub growth
Our assessment revealed weak to moderate positive correspondence between interannual variability in Landsat NDVImax and shrub radial growth. Prior efforts have similarly demonstrated modest positive correspondence between AVHRR NDVI and both shrub 16,17,82 and tree 62, 83, 84 radial growth in northern ecosystems. To our knowledge, Landsat NDVI has not previously been compared with interannual variability in shrub radial growth, but the generally positive but modest correspondence is not entirely surprising since both are metrics of carbon exchange. The NDVI is related to plant canopy leaf area and nitrogen content that affect light harvesting and subsequent carbon uptake (GPP) by the plant community 80,81 . On the other hand, shrub radial growth reflects carbon assimilation into plant aboveground woody tissues, which is but one aspect of whole plant NPP that also includes leaf and belowground productivity. Consequently, NDVI and radial growth are imperfect proxies for different metrics of carbon exchange that we might expect to positively covary under the assumption that both reflect interannual variations between better and worse years of vegetation growth 84 .
The degree of covariation between these proxies will likely be affected not only by interannual variability in respiration and allocation, but also by landscape heterogeneity 84 . Shrubs can have a strong effect on NDVI in tundra ecosystems 16,49,85 , but are one function type of varying dominance in overall plant communities 49,86 , which are themselves embedded in a mosaic of different land cover types. Consequently, we hypothesize that the correspondence between NDVI and shrub RWI is probably related to plant community and land cover heterogeneity, and the degree to which shrub RWI reflects fluctuations in above-ground plant productivity across the landscape 87 . This topic deserves future attention. Overall, the modest positive correspondence between Landsat NDVImax and shrub radial growth provides support for interpreting Landsat NDVImax as a proxy for aspects of tundra plant productivity associated with shrubs.