A proxy-year analysis shows reduced soil temperatures with climate warming in boreal forest

Scientists unequivocally agree that winter air temperature (TA) in northern high latitudes will increase sharply with anthropogenic climate change, and that such increases are already pervasive. However, contrasting hypotheses and results exist regarding the magnitude and even direction of changes in winter soil temperature (TS). Here we use field and satellite data to examine the ‘cold soil in a warm world’ hypothesis for the first time in the boreal forest using a proxy year approach. In a proxy warm year with a mean annual temperature similar to that predicted for ~2080, average winter TS was reduced relative to the baseline year by 0.43 to 1.22 °C in open to forested sites. Similarly, average minimum and maximum winter TS declined, and the number of freeze-thaw events increased in the proxy warm year, corresponding to a reduction in the number of snow-covered days relative to the baseline year. Our findings indicate that early soil freezing as a result of delayed snowfall and reduced snow insulation from cold winter air are the main drivers of reduced winter active-layer TS (at ~2-cm depth) under warming conditions in boreal forest, and we also show that these drivers interact strongly with forest stand structure.

. Several snow manipulation studies 2,15,16 have suggested that in a warmer world soils during winter months may be colder as a result of decreased and delayed snow insulation. In contrast, most simulation models 4,5,17 have predicted a rise in T S in warm climates, as a synergistic effect of increased T A and reduced snowfall, though a few models 18,19 do suggest that climate warming could reduce T S under some conditions. T S measurements in high latitude ecosystems have commonly been made at a depth of ≥10 cm, but the soil is most responsive and biologically active above 10 cm soil depth 10,20 . Contradictory model predictions indicate that T S sensitivity to climate change is not well understood 18 , and a lack of data to test alternative models has been noted 10 .
Here we provide a first test of the 'cold soil in a warm world' 15 hypothesis in the boreal forest using a proxy year approach, making use of recent climate variability to compare T S patterns between a proxy baseline year (Y B ) and a warm future year (Y W ) (Fig. 1). We used soil and micrometeorological tower sensor data from study plots distributed in open, partially forested, and forested sites in a mixedwood boreal forest in northwestern Ontario, Canada. Snow cover durations were inferred from diurnal T S patterns and confirmed using synthetic high-resolution imagery (fusing MODIS and Landsat 8 data). Scientific

Results
Field measurements (Fig. 2) and secondary data ( Fig. 1) indicate that Y W successfully represented a warm future year in the northern high latitude ecosystems. Consistent with climate model predictions, differences in winter T A between Y W and Y B were particularly large: average winter T A in open, partially forested, and forested sites were 6.58 °C , 9.17 °C , and 9.46 °C higher (p < 0.05), respectively, in Y W than those in Y B (Fig. 2).
Sensor data indicate that average winter T S were significantly lower in Y W compared to Y B (Fig. 3a). In open, partially forested, forested sites, respectively, average T S in Y W was 0.43 °C , 1.22 °C , and 1.13 °C lower (p < 0.01) in Y W than those in Y B . Average minimum and maximum winter T S also showed similar patterns under all site conditions (Supplementary Results and Fig. S5a). The differences in average spring T S between Y B and Y W were not as consistent (Fig. 3b). In Y W they were 1.54 °C lower (p < 0.01) in partially forested sites, but marginally higher (0.12 °C , p = 0.38) in open sites, and lower (0.34 °C , p = 0.2) in forested sites than those in Y B . In Y W average spring minimum T S were consistently lower and maximum T S showed similar patterns as mean T S compared to Y B (Supplementary Results and Fig. S5b). Overall seasonal patterns in mean and average minimum/maximum T S in different site conditions throughout Y B and Y W are presented in Supplementary Figs S4 and S6, respectively.
Snow cover started earlier and lasted longer in Y B than in Y W . T S -based snow cover estimates show that in Y B snow started on average 18, 22, and 23 days earlier (p < 0.01) in open, partially forested, and forested sites, respectively, than in Y W . Likewise, in Y B snow ended generally on average 3, 16, and 9 days later in open, partially forested, and forested sites, respectively, than in Y W . T S and satellite-based estimates of snow start/end dates agreed well with each other with some year-and site-specific variations (Supplementary Table S3). Satellite-based snow start dates for Y B were 2-14 days later than T S -based estimates, which were 3, 10, and 9 days earlier in open, partially forested, and forested sites, respectively, for Y W . Snow end dates had the least disagreement of only ±1 day for both years across all site conditions (Fig. 2). T S -based snow cover duration (SCD ST ) estimates suggest that SCD ST in Y B were 48, 49, and 51 days longer (p < 0.01) in open, partially forested, and forested sites, respectively, than those in Y W (Fig. 3c). The number of FTEs was substantially higher in Y W compared to Y B (Fig. 3d), increasing by 53% (p = 0.86), 657% (p < 0.01), and 69% (p = 0.07) in open, partially forested, and forested sites, respectively.

Discussion
Y W average winter T S at 1-2 cm depths were, depending on site conditions, 0.43-1.22 °C lower than those in Y B (Fig. 3a). This result clearly supports the 'cold soil in a warm world' hypothesis in the boreal forest context. Snow manipulation studies 2,15,16 and model results 18,19 in other ecosystems have also found evidence in support of this hypothesis. Although studies have predicted a rise in spring T S 5,17 , our data suggest that Y W average minimum T S was 0.45-1 °C lower compared to Y B with a considerable reduction in magnitude with increasing Leaf Area Index (LAI) (open to forested sites). In contrast, spring mean and maximum T S did not exhibit any consistent pattern ( Fig. 3b and Supplementary Fig. S5b). The opposite pattern in spring minimum and maximum T S and the number of days with daily average T S ≤ −5 °C (Fig. 3d), corresponds to a higher frequency of FTEs in Y W than in Y B . This effect was more pronounced in forested sites than in open sites. Climate-warming-induced spring FTEs have been suggested by other studies 5,10 . Cold winter soil and frequent FTEs in warm future years are likely to substantially impact terrestrial plants and microorganisms 8 . Winter soil freezing can adversely affect tree growth and functioning 21 and alter soil carbon dynamics 11,12 . FTEs have also been reported to increase nitrogen mineralization in high-latitude ecosystems 11,12 . Snow cover and its interaction with forest stand structure were the major drivers of T S differences between Y W and Y B in the present study. Early soil freezing events were associated with delayed snowfall in Y W (Fig. 2). Likewise, reduced SCD ST (by 48-51 days) (Fig. 3c), higher relative humidity (RH) (5.55-9.15%; indicative of high latent heat from melting snow) (Fig. 2), and data from nearby weather stations 22 (maximum snow thickness and total precipitation in winter and spring were ~40 cm and 190.5 mm, respectively, in Y W and were ~100 cm and 165.3 mm, respectively, in Y B ) imply that reduced insulation from thinner or less spatially continuous snow cover decreased T S 2,6,15 in Y W compared to Y B . It is also evident from Fig. 2 that T S in Y W was tracking T A more closely than in Y B 2 . Higher forest cover was associated with an increased magnitude of differences in T S and number of FTEs between Y W and Y B (Figs 2 and 3a,b,d). It is likely that in Y W with a shallower snowpack tree stems and other vegetation cover reduce T S by creating small pockets in the snowpack that allow penetration of cold air into the subnivean space, increasing FTE frequency by a 'tree well effect' 23,24 . 'Proxy year' or 'analog year' approaches have been widely used to examine potential effects of climate change on hydrology and agriculture, but only recently applied to ecological processes 25 . This approach allowed us to test the 'cold soil in a warm world' hypothesis for the first time in the boreal forest by realistically simulating composite effects of future climate warming 8 . Although we found reduced T S at shallower depth under warming conditions, the findings are still consistent with projected long-term warming in the deep soil. T S at shallower depths are prone to rapid changes modulated by soil moisture and insulating effects of snow, litter, and vegetation 26 , while deep soils respond to the integrated transfer of thermal energy. Because of soil's high thermal capacity and low heat conductivity, diurnal/seasonal T S changes attenuate with increasing depth and lag considerably behind those of shallower soils. Studies have found the usual soil frost depth is ~15 cm in high-latitude ecosystems 27 . We thus can assume the 'cold soil in a warm world' effect is limited to a similar depth. Since carbon in boreal forest soils is primarily stored in the uppermost soil horizons and organic layer 28 , wintertime reductions in surface T S might create a negative climate feedback by reducing soil heterotrophic respiration 6,12 . By assuming simple linear relationships between T A and T S , most existing models will miss this feedback and likely over-estimate warming effects on soil C loss. Conversely, increases in FTE are predicted to negatively affect boreal forest regeneration and productivity, which could constitute a positive climate feedback. The insights from our study are thus an important input to development of credible predictions of climate-induced T S change at shallow soil depths that are most important to carbon processes in high-latitude ecosystems 14 . The study area is generally flat with an average elevation of 416 m a.s.l. The soil in this area is a moderately deep Brunisol (coarse loamy texture) with organic layer thickness (LFH) 1-25 cm 29 and average pH ~5.3. The growing season for this area varies from 110-120 days 29 . Climate normals for annual temperature and precipitation (measured at Armstrong), and snow depth (measured at Thunder Bay) are -1.1 °C , 738.4 mm, and 9 cm, respectively. Mean annual daytime and nighttime windspeeds, measured at Armstrong at 10 m aboveground over the study period, were 0.7 ms −1 (maximum 1.2 ms −1 ) and 0.4 ms −1 (maximum 1 ms −1 ), respectively 22 .
Instrumentation and measurements. Plots were established in locations with at least 1 ha of identical disturbance (either harvest or fire) of similar age-class and were at least 1 km away from each other and from any water body. We used fire maps (obtained from the Ontario Ministry of Natural Resources) and forest management plans (obtained from Resolute Forest Products) to collect information about the forest management history, disturbance type, and stand age in aiding the selection of plot locations.
At the center of each plot a micrometeorological tower was set up to measure air temperature (T A ) and relative humidity (RH) every hour at 1.5 m height from the ground (data collected using a LogTag HAXO-8; range (T A /RH): -40 to +85 °C/0 to 100%; minimum accuracy (T A /RH): ±0.5 °C/0.1%). Additionally, we installed nine soil temperature (T S ) sensors (LogTag TRIX-8) in each plot at ~1-2 cm soil depth (following the guidelines of Lundquist and Lott 24 ), which recorded measurements at hourly intervals ( Supplementary Fig. S2). The sensors used are rated by the manufacturer from -40 to +85 °C with a minimum accuracy: ±0.5 °C; in lab calibration trials we found the RMSE to be ±0.11 °C in the range -10 to 35 °C (see Supplementary Texts for accuracy reports on this sensor). Each T S sensor was sealed in thin (0.09 mm) waterproof plastic film and was placed at least 50 cm away from tree trunks. Sensor locations were recorded as bearings from the center of the plot and marked with flagging stakes. Microclimatic data were collected annually in summer, and any compromised sensors were replaced.
Leaf Area Index (LAI) was determined using hemispherical photographs (HPs) taken with a Nikon CoolPix 4500 (4 Megapixels) camera with a Nikon FC-E8 fisheye converter (angle of view 183°) mounted on a tripod. Except in 2013, summer and winter HPs were taken each year in early July and late September/October, respectively, in three equally spaced locations within each plot at 1 m above ground as shown in Fig. S2. Exposure settings and analysis of HPs, using Gap Light Analyser 30 , were done as per the guidelines of Zhang et al. 31 . The average of the three LAI-4 (LAI estimated over zenith angle 0-60°) values was used as the LAI for a plot in a given season/year.
Stand density was measured every year as the number of trees (diameter at breast height ≥5 cm and height >1 m) within each plot and converted to stems/ha. Heights (m) of these trees were measured every year using a Suunto PM-5 Clinometer. Similarly, litter depths (mm) were measured using a ruler in locations adjacent to each soil temperature sensor within each plot. We set up four 1-m 2 subplots within each plot and visually determined the percent cover of ground-layer vegetation every year ( Supplementary Fig. S2).
Proxy year establishment. Proxy baseline (Y B ) and future warm (Y W ) years were determined by comparing local (study area) T A anomalies with the northern hemisphere (NH) anomalies. The HadCRUT4 NH monthly T A anomaly data 32 over 1840-2017 were used for this purpose. A simple polynomial regression curve was fitted to the annual NH T A anomaly data to show a general trend over the 21 st century (Fig. 1). The GHCN-D (v2) (Global Historical Climatology Network -Daily data) daily average T A data 33 for weather stations around the study area (48°-50° N and 88°-90 o W) were analysed via the KNMI Climate Explorer (https://climexp.knmi.nl) platform to calculate local monthly T A anomalies with respect to the 1961-1990 baseline year (since HadCRUT4 anomaly data are also based on 1961-1990).
Results from a one-way ANOVA with robust estimation indicated that December 2013-November 2014 had the lowest annual T A anomaly among the years over the study period (2013-2017) and did not differ significantly (p = 0.11) from NH average annual anomaly. December 2015-November 2016, however, had the highest T A anomaly for all seasons over the study period and the annual T A anomaly was significantly higher (p < 0.01) than the NH average annual anomaly (Fig. 1). These years are representative of the historical baseline years and projected warm future years. So, for this study, we chose To determine snow cover duration (SCD ST ) from T S for each sensor in each plot, hourly sensor data were converted to daily T S ranges (ΔT S = daily maximum T S -daily minimum T S ). If ΔT S remained ≤1 °C over 48 hours and the daily maximum T S was <2 °C , we considered 'snow present' for that day. The resulting daily snow present/absent time series were checked against T S sensor data and snowfall event data from nearby Armstrong airport weather station to ensure prediction quality. A number of studies 24,34,35 have successfully used similar algorithms to determine SCD ST .

T S , T A , RH, SCDST, and freeze-thaw events (FTEs) data analysis.
Hourly sensor data were first converted to daily mean, minimum, and maximum values that were then used to calculated plot-wise monthly mean, minimum, and maximum T S /T A /RH for each sensor. Plot-wise seasonal mean T S , average minimum and maximum T S , and mean T A /RH for each sensor were calculated from monthly data. Seasons in this study were defined as: winter (December-February), spring (March-May), summer (June-August), and fall (September-November). The frequency of freeze-thaw events (FTEs) for each year in all site conditions were calculated as the number of days with daily average T S ≤ −5 °C (there was no more than 1 FTE per day). Instead of using T S < 0 °C , we choose −5 °C because studies have found that at T S ≤ −5 °C soil microbial activities are inhibited substantially in high-latitude ecosystems 11 .
Site-specific differences in T S , T A , RH, SCD ST , and FTE between Y B and Y W were tested using linear mixed effect (LME) models. For comparison of T S we focused both on mean and minimum/maximum values, because in projected future warm years maxima/minima of the extreme climate events can have more serious consequences for plants and microorganisms than changes in projected mean values 1 . In LME models, sensor replications nested within each plot were considered random effects, and proxy year and site conditions (and their interactions) were considered as main effects. Dependent variables (T S , T A , RH, SCD ST , FTE) were log-transformed where necessary to meet the residual normality assumption of LME models. All analyses were preformed using the R language platform 36 .

Snow cover duration from satellite data (SCD S ).
Remote sensing assessments of snow cover duration in spatially heterogeneous sites require high-resolution spatiotemporal satellite data. None of the freely available satellite images meet this resolution requirement; for example, the MODIS (Moderate Resolution Imaging Spectroradiometer) satellite provides daily global data at a low spatial resolution (maximum 250 m) and the Landsat satellites provide high spatial resolution (30 m) global data at a 16-day interval (pixels are often cloud contaminated). Thus, integrating high-temporal-resolution MODIS data with high-spatial-resolution Landsat data to produce synthetic data with high spatiotemporal resolution is necessary to study highly dynamic land surface processes that operate at a small scale.
MODIS-Landsat fusion has been achieved by a number of models and algorithms, including the Spatial and Temporal Adaptive Reflectance Fusion Model (STARFM) 37 , the Enhanced Spatial and Temporal Adaptive Reflectance Fusion Model (ESTARFM) 38 and spatiotemporal image-fusion models 39 . We have chosen the STARFM, originally proposed by Gao et al. 37 and tested in a Canadian boreal forest, to generate daily snow cover maps to supplement T S -based findings. In this algorithm, a first order approximation of the relationship between coarse MODIS (M) data and Landsat (L) reflectance data for a pixel located at (x i , y i ) and acquired on date t k was assumed as: Where ∈ k represents error in observed MODIS and Landsat reflectance resulting from differing bandwidth and solar geometry 37 . STARFM is one of the most extensively tested fusion techniques that requires only one MODIS-Landsat pair input (but performs better with two pair input) and requires less computational power than alternative approaches.
Data preparation for fusing. We used the Normalized Difference Snow Index (NDSI) approach to determine SCD S . It is a widely used satellite-image-based spectral index usually calculated from reflectance in green and shortwave infrared bands 40 . To properly set the NDSI threshold in forested areas, Normalized Difference Vegetation Index (NDVI), calculated from the reflectance in red and near infrared (NIR) bands, was used as an auxiliary input in the snow-mapping algorithm 41 . So, for this study, we used green, red, near infrared (NIR), and shortwave bands to map snow cover duration. Radiometrically, atmospherically, and geometrically corrected MODIS (horizontal tile: 12, vertical tile: 4) (MOD09GA V006) 42  The Landsat 8 shares similar sensor geometry with MODIS and both visit the same place at almost the same time. It can thus be assumed that they have an almost identical viewing and illumination geometry, and can be used in the fusion process without further angular adjustments 44 . MODIS images, however, were re-projected to UTM (Universal Transverse Mercator, Zone 16 N) and pixels were resampled (using the nearest neighborhood method) to 30-m resolution to match with Landsat 8 images. MODIS and Landsat 8 images were also precisely co-registered using the common point comparison method and brought into the same spatial extent.
Only cloud and water-body free high-quality pixels were used as input to STARFM. The MOD09GA surface reflectance 500 m quality assurance band was used to mask pixels with a status bit flag other than '0000' for each Scientific  of the four bands. Similarly, for Landsat 8 images, the level-2 pixel-quality band and radiometric saturation QA bands were used to mask radiometrically saturated, cloud-contaminated, and low-quality pixels. Finally, the pixels with missing values were set to -9999 and images were converted to signed 16-bit binary format. A series of R scripts 36 was used to prepare satellite images for input in STARFM. C codes to implement STARFM were adapted from Gao et al. 37 . Algorithm details and information on data preparation can also be found in Gao et al. 37 and Zhu et al. 38 .
Inputs to STARFM for producing synthetic Landsat 8 images. Two pairs of same-day MODIS-Landsat 8 images within two months either side of the prediction date 45 , along with MODIS image of the prediction date, were used as inputs to STARFM to predict synthetic Landsat 8 images for the dates for which Landsat 8 images were either not available or cloud contaminated (>20%) (Supplementary Fig. S3). Landsat 8 and MODIS equivalent bands were used to produce synthetic Landsat 8 images of the equivalent band. For example, Landsat 8 green (band 3) and MODIS green (band 4) bands were used to produce the synthetic Landsat 8 green-band image.
Accuracy assessment of predicted Landsat 8 images. To assess the accuracy of STARFM predicted images, synthetic Landsat 8 images were produced in green, red, NIR, and SWIR2 bands for three dates (2013-12-07, 2014-02-16, and 2016-03-18) spanning the study period. Predicted synthetic images were compared pixel-to-pixel with actual Landsat 8 images of the corresponding dates, and Spearman Rho (using the complete observation method), root-mean-square-error (RMSE) and mean absolute error (MAE) estimates were calculated to assess the accuracy. STARFM prediction maintained a reasonable accuracy over the study period compared to other studies 37,45 (see Supplementary Table S2).

NDSI-based snow-mapping algorithm.
Using the daily high-resolution synthetic data, NDSI was calculated as: reflectance in green band reflectance in SWIR band reflectance in green band reflectance in SWIR band 2 2 Similarly, NDVI was calculated as:

in NIR band reflectance in red band reflectance in NIR band reflectance in red band
Using reflectance property of clouds in SWIR2 band, NDSI can successfully separate clouds from snow. For mapping snow cover with NDSI, a physically based threshold value > 0.4 is usually used to indicate snow cover 40 . There is, however, evidence 46 that in conifer-dominated forests NDSI < 0.4 can also be snow. Moreover, Hall et al. 41 found that NDSI values < 0.4 can also indicate snow if NDVI value is ~0.1. It is, therefore, important to identify area-specific NDSI threshold values to delineate snow cover.
After extensive visual inspections, using Google Maps, Sentinel-2 images (red, green, and blue bands), and predicted NDVI maps, for this study 1 ≥ NDSI > 0.35 was used for October 2013-May 2014 and 1 ≥ NDSI > 0.3 was used for October 2015-May 2016 to define snow cover. This year-specific NDSI threshold mainly stemmed from snow patchiness as a result of high difference in winter T A and T S between study years 47,48 . To prevent NDSI overestimation, green band reflectance values ≤ 0.1 were masked before NDSI calculation 49 . To prevent snow-cover underestimation, at 0.1 < NDSI < 0.3 it was also considered as snow, only if 0.08 ≤ NDVI ≤ 0.12 41 .
To determine snow-start and snow-end dates, and to compare the plot-wise results with SCD ST , decisions on presence/absence of snow were made based on the information (NDSI and NDVI) extracted from each plot centers with a 15-m buffer around it. The NDSI-based snow cover mapping algorithm works only for pixels with at least 50% snow cover 41 . To ensure consistent comparisons between SCD ST and SCD S , for SCD ST we consider snow present only if ≥50% of the T S sensors in a plot agreed that there was snow on the particular day.

Accuracy assessment of NDSI-based snow cover maps. A confusion matrix was generated to
show the overall agreement in snow cover duration estimated from T S (SCD ST ) and satellite (SCD S ) data (Supplementary Table S3). Considering SCD ST as the reference and SCD S as the predicted, results from the confusion matrix show that 72.8% of the time SCD ST agreed with SCD S . The all-weather overall accuracy of our fused daily snow-cover map is higher than the MODIS daily snow products (MYD10A1 and MOD10A1) (31-45%) 50 . It is interesting to note that the overall accuracy of NDSI-based snow cover maps varied between years with a higher overall accuracy in Y B (80%) than in Y W (67%). This suggests that, in a warmer world, spatial variability in T S and snowpack will likely be higher than what it is now 17 , and we may need high-spatiotemporal-resolution satellite images complemented by field measurements 35 to accurately capture this variability.

Data Availability
The northern hemisphere HadCRUT4 data used in this study are available from https://crudata.uea.ac.uk/cru/ data/temperature/. Landsat 8 (Level 2) and MOD09GA (V006) data can be downloaded from https://earthexplorer.usgs.gov and https://search.earthdata.nasa.gov, respectively. Other data and materials can be requested to the corresponding author.