Non-growing season carbon emissions in a northern peatland are projected to increase under global warming

Peatlands are important ecosystems that store approximately one third of terrestrial organic carbon. Non-growing season carbon fluxes significantly contribute to annual carbon budgets in peatlands, yet their response to climate change is poorly understood. Here, we investigate the governing environmental variables of non-growing season carbon emissions in a northern peatland. We develop a support-vector regression model using a continuous 13-year dataset of eddy covariance flux measurements from the Mer Blue Bog, Canada. We determine that only seven variables were needed to reproduce carbon fluxes, which were most sensitive to net radiation above the canopy, soil temperature, wind speed and soil moisture. We find that changes in soil temperature and photosynthesis drove changes in net carbon flux. Assessing net ecosystem carbon exchange under three representative concentration pathways, we project a 103% increase in peatland carbon loss by 2100 under a high emissions scenario. We suggest that peatland carbon losses constitute a strong positive climate feedback loop. Future changes in non-growing season conditions, particularly irradiance and temperature, will enhance carbon emissions from a northern peatland, according to projections with a data-driven machine learning model.

W hile peatlands are estimated to cover only 3% of the continents 1 , they store approximately 30% of landbased organic carbon (C) (300-450 Pg C) [2][3][4][5] . Northern peatlands are of particular interest because high latitude regions are warming at greater rates than the global average [6][7][8][9][10] . Furthermore, the greatest warming in cold regions, including northern peatlands, is occurring during the nongrowing season (NGS) 11,12 . Many previous studies have focused on peatland C dynamics during the growing season, given the inherent difficulties of measuring NGS carbon dioxide (CO 2 ) fluxes in remote cold regions. However, a growing body of literature shows that net ecosystem CO 2 exchange (NEE) from various landscapes during the NGS is nontrivial and may contribute significantly to annual ecosystem C budgets [12][13][14][15][16][17][18][19][20] . Despite these efforts, there is a lack of predictive understanding of how NGS CO 2 emissions from northern peatlands will change under an evolving and uncertain 21st-century climate.
Only a few studies have projected future NGS-NEE of peatland ecosystems under representative climate concentration pathways (RCPs) for northern regions 12,[21][22][23] . In part, this reflects an incomplete understanding of the key drivers of NGS soil respiration. Previous work, however, shows that soil CO 2 effluxes during the NGS are modulated by the interactions between numerous environmental parameters, including soil temperature and moisture 12,[24][25][26] , snow depth 25,[27][28][29][30] , vegetation cover 31 , availability of labile C substrates, and the soil microbial abundance and community structure 32-34 . Deterministic models are traditionally used for understanding environmental processes. The processes included, and their model representations, are derived from prior knowledge and theoretical considerations. The models in turn form the basis for hypothesis-driven experimentation with, as a possible unintended consequence, biases the observational data due to the underlying hypotheses 35 . As an example of this traditional deterministic approach, terrestrial ecosystem C cycling has been successfully simulated using process-based dynamic vegetation models. As an alternative approach, data-driven (rather than hypothesis-driven) modeling techniques, including machine-and deep-learning methods, are increasingly employed to derive quantitative relationships between environmental variables and ecosystem functions (e.g., soil respiration) based on patterns in the measured data.
Machine learning (ML) is the study of statistical models and algorithms 36 and has been extensively applied to the modeling of carbon fluxes measured at eddy covariance (EC) flux towers from a variety of ecosystem types [37][38][39][40] . The applicability of ML to modeling carbon fluxes was best demonstrated in the recent work by Tramontana et al. 41 who used a large ensemble of ML-based models in predicting CO 2 , energy, and radiative fluxes across various plant-functional types. Similar studies have achieved great success in modeling CO 2 fluxes using ML techniques, as shown in the studies conducted by Cai et al. 36 , Melesse and Hanley 39 , and Xiao et al. 42 who used gradient boosting and random forest, multilayer artificial neural networks, and regression trees respectively to predict carbon fluxes measured at EC flux towers for various applications.
In this study, we train an ML model for NGS-NEE on a 13-year (1998-2010) continuous record of EC flux measurements at the Mer Bleue Bog located in Ottawa, Canada. The most important variables modulating the NGS-NEE at the research site are determined using a variable selection methodology and a moment-independent global sensitivity analysis (GSA) method. In addition, we project the NGS-NEE CO 2 emission rates over the remainder of the 21st century by considering how the key environmental variables will change under a low, moderate, and high radiative forcing scenario (RCP2.6, RCP4.5, RCP8.5, respectively). The findings of this study provide novel insights into NGS CO 2 emissions from northern peatlands as they transition into a warmer world, with implications for future climate policy, evolving northern landscapes, and the associated hydrometeorological, snow, and frozen-ground processes.

Results
Model performance and validation. Model performance, both validation and testing, of predicted NGS-NEE is illustrated in Fig. 1. Model validation and testing performance metrics varied slightly with each new training cycle. Averaged over n = 1000 model training cycles, the cross-validation results showed a mean RMSE = 0.09 and mean r 2 = 0.60, implying that the model explained most of the observed variability of NGS-NEE. Model testing results further implied good model performance across the testing data with RMSE = 0.08 and r 2 = 0.65.
Model dependencies. The effects of each of the seven predictor variables on the modeled NGS-NEE are shown in the partial dependence and individual conditional expectation plots of Fig. 2. In most simulations, the modeled NGS-NEE showed net positive responses to increases in wind speed, soil moisture content, as well as air and soil temperatures. Wind direction exhibited little to no net influence on modeled NGS-NEE and upwelling photosynthetic photon flux density (PPFD) a slightly negative effect. By contrast, net radiation above the canopy exhibited a strong negative effect on the predicted NGS-NEE. This negative control can be attributed to increased photosynthetic CO 2 uptake with increased net radiation.
Sensitivity analysis. The calculated SIs provide further insight into the relative influence of the seven predictor variables on NGS-NEE estimates (Fig. 3). The results implied that net radiation above the canopy has the greatest influence on modeled NGS-NEE with a mean SI of 0.98 (0.98-0.99, 95% confidence interval). Wind direction was the least sensitive variable, with a mean SI of 0.08, that is, only slightly larger than that for the dummy (control) variable, which had a mean SI of 0.03. Soil temperature exerted a pronounced influence on modeled NGS-NEE with a mean SI of 0.72, consistent with soil temperature as a major control on subsurface organic C mineralization and, hence, production of CO 2 . Wind speed and soil moisture content were of moderate to high importance (mean SIs = 0.62 and 0.61, respectively) and a lower influence of upwelling PPFD. Table 1 summarizes the mean values and ranges of all the SIs.
Climate change predictions. Under all three emission scenarios, NGS-NEE CO 2 fluxes were predicted to increase over the remainder of the 21st century (Fig. 4). That is, the Mer Bleue Bog will act as a stronger source of CO 2 during the NGS, even under the lowest radiative forcing scenario (i.e., RCP2.6). As also shown in Fig. 4, the predicted future NGS-NEE trends for the three climate scenarios diverged most significantly after 2050.
Under the high radiative forcing scenario considered (RCP8.5), the mean NGS-NEE CO 2 emission rates at the end of the century would rise to 0.62 μmol m −2 s −1 (0.60-0.63; minimum and maximum values from n = 1000 runs of the model). Compared to the observed mean NGS-NEE of 0.31 μmol m −2 s −1 for the period 1998-2010, NGS-NEE under RCP8.5, therefore, exhibited roughly a 2-fold increase by 2100 (mean increase of 103%). If the stringent emission reduction measures and policy shifts are implemented to limit radiative forcing to 2.6 W m −2 by the year 2100 (RCP2.6), NGS-NEE values at the Mer Bleue Bog site would only increase by 17% (approximately 6.2 times less than under RCP8.5). Under the stabilization climate scenario of RCP4.5, NGS would increase by 48% to 0.45 μmol m −2 s −1 . Table 2 summarizes both mid-century (2050) and end of the century (2100) predicted NGS-NEE CO 2 fluxes under the three RCPs.

Discussion
Our results for the Mer Bleue Bog site support the hypothesis that environmental changes accompanying climate warming have the potential to increase CO 2 emissions from peatlands during the NGS, hence potentially strengthening a positive climate feedback loop. As expected, the predicted NGS-NEE increase is highest for the climate scenario with the largest radiative forcing and vice versa. The predictor variables for NGS-NEE further implies that both subsurface and surface processes play important roles in modulating the net CO 2 fluxes during the NGS. The subsurface controls are likely linked to the decomposition of soil organic matter, which is known to strongly depend on soil temperature and moisture. Aboveground turbulent transport of CO 2 and energy balances, in turn, depend on meteorological conditions and snow cover dynamics 43,44 . In the following discussion special attention is given to the influence of snow on both surface and subsurface processes modulating NGS-NEE of CO 2 .
Subsurface processes. According to the GSA, at the Mer Bleue Bog site soil temperature is the second most sensitive predictor variable of NGS-NEE (Fig. 3), with a positive influence on the net emission of CO 2 (Fig. 2). This positive effect is consistent with the positive temperature dependence of soil microbial respiration, even when soil temperatures approach or drop below zero during the winter 18,45,46 . In the climate projections considered in this study, soil temperature changes were assumed to be controlled by the predicted changes in air temperature, with an imposed 1°C attenuation (see "Climate projections"). The actual relationship between air and soil temperatures during the NGS may be far more complex, however, largely due to the influence of snow coverage on the belowground thermal regime 47,48 . Even at the shallow depth of 20 cm, Mer Bleue Bog peat rarely freezes despite mostly subzero winter temperatures ( Supplementary Fig. S1). This can be explained by the insulating effect of the snowpack, with peak annual snow depths ranging from 30 to 120 cm for the 1998-2010 period. With the projected decreases in snow depth and fractional snow-cover for the Ottawa region 9 , the more exposed soils could experience greater heat loss and, consequently, cooler peat temperatures. More detailed future projections of the changes in soil temperature will have to take into account the possible confounding effect of reduced snow cover. Snow depth and coverage also alter the mechanisms and rates of soil-atmosphere gas exchanges, with variable contributions of diffusive and non-diffusion transport 49 . A lower fraction of snowcovered soil diminishes the gas transport barrier and enhances the release of CO 2 while, at the same time, facilitating the influx of molecular oxygen (O 2 ) which fuels the production of soil CO 2 by aerobic respiration. However, surface heterogeneities, including hummock and hollow peat microforms, may cause differential snow accumulation patterns that influence the overall effect on NGS-NEE of a reduction in snow cover. Such small-scale processes are a source of uncertainty not taken into account in our analysis.
Climate warming may also increase in the frequency of freeze-thaw events, especially at the start and near the end of the NGS. Many recent studies have linked freeze-thaw cycles to variations in CO 2 emissions [50][51][52][53] . Recurring freezing and thawing alter the physical and biological processes that contribute to winter soil respiration [54][55][56] . In addition, freezing causes ice layers to form in the snowpack and in the near-surface peat resulting in ice encasement 57 . The formation of ice layers traps CO 2 produced or stored in the peat and snowpack while thawing of the ice results in the release of the trapped CO 2 to the atmosphere.
Besides soil temperature, the modeled NGS-NEE CO 2 fluxes are sensitive to soil moisture recorded at 20 cm depth (Fig. 3). At this depth, NGS soil moisture in the Mer Bleue Bog remains quite low (typically less than 0.15 m 3 m −3 , Supplementary Fig. S1)   because of the relatively good drainage of the surface peat. Values of~0.04 m 3 m −3 occur when peat at 20 cm occasionally freezes indicating a continued availability of unfrozen water during winter, likely as thin films surrounding organic soil particles 58 . The positive and nonlinear correlation between soil moisture content and NGS-NEE at Mer Bleue ( Fig. 2) agrees with similar positive relationships reported by Hirano 24 , Liptzin et al. 25 , and Schindlbacher et al. 59 .
In our GSA and ML modeling, we rely on the soil temperature and moisture content data recorded at 20 cm depth. This is justified because most decomposition of plant debris (and hence CO 2 production) in peatlands occurs within the upper 10-20 cm 60 . In addition, the temporal records of both soil properties exhibited sufficient variability at 20 cm depth to train the ML model. At the Mer Bleue Bog site, temperature and moisture are higher deeper in the soil profile by the end of the NGS. However, the lack of fresh plant residues and O 2 at these depths limit microbial mineralization 61,62 and, thus, variations in temperature or moisture below 20 cm are unlikely to have a strong impact on the NGS-NEE CO 2 fluxes.
Because of its mid-latitude location (45°24′N), the Mer Bleue Bog site will continue to experience a high frequency of near freeze-thaw cycles over the remainder of the 21st century 63 . The implications of changes in freeze-thaw frequency and timing for the instantaneous and cumulative CO 2 emissions during the NGS, at this site and across peatlands in general, remain to be fully understood. Similarly, the length of time during winter when the soil is continuously frozen probably affects the NGS-NEE, in line with Humphreys et al. 64 who showed smaller winter CO 2 emissions for peatlands further north in the Hudson Bay Lowlands compared to the Mer Bleue Bog. A fully predictive understanding of CO 2 production and effluxes during the NGS will require additional characterization of the temporal variations in physical and biogeochemical properties and processes of the snowpack and underlying peat.
Surface processes. The most sensitive predictor variable for NGS-NEE is net radiation measured above the canopy (Fig. 3). With increasing net radiation, the modeled net CO 2 emissions decrease and NEE eventually switches to net CO 2 uptake. Net radiation in the NGS, particularly in early spring (February-March), strongly depends on the ground snow coverage. Upwelling PPFD also exerts a negative effect on NGS-NEE. With wavelengths between 400 and 700 nm, the PPFD encompasses part of the shortwave radiation spectrum and provides energy to support photosynthetic CO 2 fixation. Similar to net radiation, upwelling PPFD at Mer Bleue Bog is greatest in early spring as days get longer and a large fraction of incoming radiation is reflected from the snow surface.
The strong negative correlation between NGS-NEE and net radiation and, to a lesser extent, upwelling PPFD (Fig. 2) may be the result of patchy snow accumulation and melting, which exposes Sphagnum-covered hummocks to light thus enabling photosynthesis 65 . Sphagnum growth has been shown to depend on various environmental factors related to snow coverage, including the timing of snow onset and retreat, the amount of snow, mid-winter thawing, and drifting snow conditions 66,67 . Nonnegligible primary productivity by bryophytes may thus have important, but largely unexplored, consequences for estimating peatland NGS-NEE of CO 2 .
With projected decreases in snow-water equivalents and snowcover fractions at Mer Bleue Bog over the 21st century 9 , interesting implications arise for advective sensible heat transfer between bare soils and snow patches 68,69 . Rain-on-snow events may similarly have significant impacts on NGS-NEE. With a greater proportion of NGS precipitation falling as rain [70][71][72] , rainon-snow events can result in wetter soils and reduced or removed snow coverage. Under these conditions, photosynthetic activity during the NGS may play an increasingly important role in the annual C budget of the Mer Bleue Bog and, by extension, other temperate and boreal peatlands 73,74 .
The final sensitive predictor variable is wind speed which correlates positively with NGS-NEE (Figs. 2 and 3). A positive relationship is expected as turbulent exchanges of CO 2 , heat, and moisture are driven by forced convection during the NGS when the surface is snow-covered and the atmosphere remains neutral or stable. In addition, wind-induced ventilation of the snowpack can cause pulse emissions of CO 2 stored in the snow and the porous and unsaturated near-surface peat 73,74 . Turbulent exchanges of heat and moisture play important roles in governing the snowpack energy balance 75,76 . Wind can, therefore, be indirectly linked to subsurface CO 2 production processes through its effects on energy fluxes into the ground.  Some cautionary notes. As for ML in general, the development of the NGS-NEE model was only possible because the dataset for the Mer Bleue Bog site was large and diverse enough to enable the model learning process. Applying the same methodology to other locations is thus contingent on access to sufficient data records. In addition, a limitation of the ML approach is that only predictor variables included in the available dataset can be selected. Important variables not explicitly represented in the dataset may therefore remain hidden. Such variables may include, for example, the structure and pH of the peat, groundwater flow rates and pathways, and the timing and the frequency of rain-on-snow events. Furthermore, the inferred sensitivity ranking and effects of the predictor variables, and the projected future changes in NGS-NEE, are site-specific and cannot be automatically extrapolated to other locations. Rather, the Mer Bleue Bog provides a reference site against which differences in the importance of surface and subsurface processes across peatlands can be hypothesized to lead to differences in NGS-NEE CO 2 fluxes and their sensitivity to environmental change. For instance, compared to the Mer Bleue Bog, extensive permafrost at higher latitudes results in unique thermal and hydrologic regimes regulating CO 2 exchanges of peatland ecosystems 64,77,78 . For these peatlands, the future trajectories of NGS-NEE will undoubtedly be impacted by permafrost thaw due to climate warming [79][80][81] .
The results of the sensitivity analysis and ML modeling are also affected by how the NGS is defined. Our results actually point to non-negligible photosynthetic CO 2 fixation during the NGS, in order to explain the role of net radiation on the observed NGS-NEE. We speculate this may primarily reflect the onset of Sphagnum growth in early spring when snow cover is still present. To address the ambiguities related to the definition of the NGS, research should focus on a finer-resolution (say, monthly) analysis of the variations in NGS-NEE from late fall to early spring. The more detailed temporal trends of NGS-NEE should in turn yield more robust estimates of the contributions of the winter and shoulder seasons to the annual NEE of peatlands.
The projected future trajectories of NGS-NEE CO 2 emissions are based on estimating how the seven predictor variables will change under a set of radiative forcing scenarios. Future NGS-NEE trajectories will also be affected by other environmental drivers, including, for instance, future shifts in vegetation and fauna (including invasive species), and changes in land use and water management. Continued global warming may also increase the absorption of CO 2 by peatlands during the growing season while decreasing the length of the NGS. Together, the changing NEE of growing and non-growing seasons will ultimately determine the evolving status of a given peatland as either a net CO 2 sink or source.

Conclusions
The NGS-NEE CO 2 fluxes at Mer Bleue Bog over the period 1998-2010 can be reproduced by taking into account seven environmental variables: near-surface soil temperature and moisture, wind speed and direction, air temperature, net radiation above the canopy, and upwelling (i.e., reflected) PPFD. Of these seven predictor variables, net radiation and wind direction are the most and least influential, respectively. The significant effects of soil temperature and moisture are expected due to their roles in the subsurface production of CO 2 . The effects of net radiation and upwelling PPFD is attributed to photosynthetic CO 2 uptake by Sphagnum mosses during the NGS, while wind speed controls the release and aboveground transport of soil CO 2 . In turn, most of the predictor variables are themselves influenced by variations in the spatial distribution and depth of the snow cover. The mean NGS-NEE CO 2 fluxes at Mer Bleue Bog site are projected to increase during the 2021-2100 period, reaching values by the end of the 21st century that is 17%, 48%, and 103% higher under the RCP2.6 (low), RCP4.5 (medium), and RCP8.5 (high) radiative forcing scenarios, respectively. Thus, in a warmer world, the Mer Bleue Bog site will act as a stronger source of CO 2 during the NGS.
Site description and methods Site description. The Mer Bleue research site is a low shrub domed ombrotrophic bog located within a 2800 hectare wetland complex in Ottawa, Canada (45°24′N, 75°30′W) 64 . Originally part of the Peatland Carbon Simulator Project 82 , the site has been continuously monitored for CO 2 , energy (latent and sensible heat), radiative (long and shortwave radiation), and momentum fluxes since the construction of an EC tower in 1998. Mer Bleue has been part of a number of flux tower networks including the Fluxnet-Canada Research Network, the Canadian Carbon Program, Ameriflux, and FLUXNET.
The Mer Bleue site has a mean annual air temperature of 6.0°C. Seasonal air temperatures between November to April and May to August of −3.7 and 18°C, respectively (climate normal) 64 . Annual precipitation at Mer Bleue is 943.5 mm with 341.7 mm of this precipitation falling between May and August 64 . Soil temperatures below the surface are rarely frozen at Mer Bleue with mean NGS temperatures (1998-2010) of 0.27°C at a 20 cm depth (see Supplementary Fig. S1). Vegetation at the Mer Bleue research site is comprised of a near-continuous cover of Sphagnum capillifolium and Sphagnum magellanicum with an overstory dominated by ericaceous shrubs including Chamaedaphne calyculata, Kalmia angustifolia, Rhododendron groenlandicum, and Vaccinium myrtilloides, along with some sedges (Eriophorum vaginatum) and herbs (Maianthemum trifolium) 83 . The area underwent bog formation starting approximately 7000 years ago with current peat depths ranging between <0.3 m at the margins to >5 m in the center of the bog 84 . The research site area has a typical hummock-hollow (70-30%) microtopography with a mean elevation difference of 0.25 m between hummock-tops and hollow-bottoms 85 with few scattered tree species (Larix laricina, Betula populifolia, and Picea mariana) 64 .
Data acquisition. Fluxes and ancillary measurements of micrometeorological parameters recorded at the Mer Bleue site were obtained from the Fluxnet Canada Research Network via the Oak Ridge National Laboratory's Distributed Active Archive Center for Biogeochemical Dynamics (ORNL DAAC) (https://daac.ornl.gov). The 13-year dataset (1998-2010) was retrieved and filtered to only extract NGS-specific data. For this study, we define NGS as the period starting with the first day of the first 3 consecutive days with ground snow coverage and ending with the first day of the first three consecutive days with bare ground. The corresponding dates along with the associated NGS durations are provided in Supplementary Table S1.
Values of NEE were derived from CO 2 fluxes measured by EC with a three-dimensional sonic anemometer thermometer (model 1012R3 prior to September 1, 2000, and model R3-50 thereafter; Gill Instruments Ltd., Lymington, UK) and closed-path H 2 O/CO 2 gas analyzer (model LI-6252 until September 1, 2000, LI-6262 until January 1, 2004, and LI-7000 thereafter; LI-COR Inc., Lincoln, NE) located 3 m above the bog surface 84 . Highfrequency data (10 Hz prior to 2004, 20 Hz thereafter) were used to average fluxes over a 30-min period. Details on the flux calculation are given by Roulet et al. 84 they involved accounting for air density fluctuations using either the WPL procedure 86 or high-frequency calculation of CO 2 and H 2 O mixing ratios.
Although no spectral corrections were applied, a correction factor of 1.25 was applied to the CO 2 fluxes prior to 2004 to account for the changes in the EC system in 2004, which reduced high-frequency flux attenuation 84 . Thus, this CO 2 flux dataset retains some bias due to high and low-pass filtering effects. A subset of the flux data collected since 2004 was reprocessed using LI-COR's EddyPro v7.0.6 software and demonstrated that this underestimation is approximately 10%. Specifically, fluxes with analytic corrections of high-pass 87 and low-pass 88  At any given time, NEE is the sum of the turbulent CO 2 fluxes and the rate of change in CO 2 storage below the EC tower calculated from the concentrations of CO 2 measured by the EC instrumentation. We define positive NEE as the net release of CO 2 from the ecosystem while negative NEE represents net CO 2 uptake by the ecosystem. Thus, NEE represents the difference between gross primary productivity and heterotrophic plus autotrophic respiration (together referred to as ecosystem respiration). Note, that although the primary productivity of vascular plants during the NGS is usually negligible, this may not be the case for peat-forming mosses (see "Surface processes"). NEE fluxes were removed from the dataset in case of instrument malfunction, statistics outside of acceptable limits, and when the nighttime friction velocity dropped below 0.1 m s −1 .
Variable selection. The initial dataset containing 68 variables (67 predictor variables and 1 response variable) was subjected to a four-step variable selection methodology (Fig. 5). The methodology was designed to reduce the number of variables represented in the dataset, leaving only those that exert the greatest influence on the NGS-NEE of CO 2 . Only these remaining predictor variables were then used in the creation of the ML model.
Correlated variables. As a first step, collinearities between variables were examined by computing Pearson's linear coefficients of correlation (ρ) between pairs of variables. The correlation matrix (M) of ρ for each pair of variables (x, y) was obtained using the following equations: M ¼ ρðx; xÞ ρðx; yÞ ρðy; xÞ ρðy; yÞ ð3Þ where Cov represents the covariance between variables (x) and (y), x ¼ and y ¼ are the mean values of the variables (x) and (y), respectively, and σ is the standard deviation for each variable. Note that in M pairwise correlations between the same variables, i.e., ρ(x, x) and ρ(y, y), equal unity. Pairs of variables with jρj ≥ 0.75 were identified and labeled as correlated variable 1 (CV1) and correlated variable 2 (CV2). The average correlation between CV1 and the remaining 67 variables was then calculated. The process was repeated for CV2. The CV with the largest average correlation with the remaining 67 variables in the dataset was removed. Using this approach, 40 predictor variables were removed. Figure 6 provides a visual representation of variable collinearities prior to the application of the variable selection methodology. For variables in which the linear correlation coefficient could not be calculated due to missing observations or a standard deviation of zero, a Not a Number placeholder was assigned.
Missing sata and near zero variance. In order to reduce biases in the development of the ML model, no gap-filling or data imputation methods were applied. Variables with more than 50% of observational values missing were removed from the dataset. Supplementary Figure S2 presents an overview of the data coverage over the data record used in this study. With this threshold, nine variables were eliminated. The remaining 19 variables were screened and removed if their variance approached zero. Variables with minimal to no variation in their values contribute minimally to the model learning process. They are also likely to be of lesser relevance in predicting future NEE than variables prone to change 92 . Variables were therefore discarded if the fraction of their unique values was low, and the ratio of the mode to the second most common value (defined as the frequency factor) was high 93 . Specifically, in this study, predictors with <10% of their total observed values being unique and a frequency Random forest permutation. A Random Forest Ensemble Model (RFM) was used to rank the importance of each of the variables remaining after the selection procedure. Random forest regression is a supervised learning algorithm that considers an ensemble learning method and bootstrap aggregation. This algorithm uses combinations of trained decision trees to obtain a predicted output. The reader is directed to Breiman 94 for a detailed discussion of random forests. In this study, the RFM consisted of an out-of-bag variable importance estimation algorithm modified for regression analysis according to Breiman 94 . The RFM was trained on the observations for each of the remaining variables ðx j Þ using 500 learner trees. Surrogate splits helped reduce biases due to the presence of missing observations. To further minimize bias, an interaction-curvature test was applied during the training process such that tree splits were made on predictors that minimize p-values from pairwise chisquared ðχ 2 Þ tests of independence between the predictor and the response 95 .
To estimate the importance of each variable, predictions based on the data used for the training of the RFM were compared to observations not used in the creation of the trees (i.e., the out-ofbag samples). The mean-square error (MSE) between predictions using the training data and the out-of-bag samples was determined and referred to as the out-of-bag error ðε oob Þ. The observations of each variable ðx j Þ were randomly permuted and a prediction .. obtained. The model error for each observation (ε mj ) was calculated by comparing y j to the out-of-bag samples. The mean difference Δ m between ε oob and ε mj , as well as the standard deviation (σ j ) of the differences in permuted values overall learners, were calculated. The predictor importance was then expressed as where I j measures the importance of each predictor j: a larger value of I j represents greater importance. Based on the value of I j , soil temperature and moisture (at 20 cm depth below the hummock surface), wind direction and speed, air temperature, net radiation (above the canopy), and the upwelling PPFD were identified as the final predictor variables for use in developing the NGS-NEE ML model. Values of I j are presented in Supplementary Table S3 for the reader reference.

NGS-NEE model development.
Further data preprocessing standardized the values of each of the final predictor variables between 0 and 1. A K-means clustering algorithm was then applied to n = 15,267 observations to collapse and group the dataset into 122 representative centroids. The K-means clustering algorithm is an iterative unsupervised learning algorithm that aims to assign individual observations into K distinct clusters to minimize the intra-cluster sum of squares 96 . The centroids of these clusters contain the statistical information of individual clusters thus allowing for increased training speeds by reducing the number of data points 97 . The number of clusters used in this analysis was determined by applying the K-means clustering algorithm for increasing values of K ðK 2 Z þ ; 1 ≤ K ≤ d ffiffiffi n p eÞ until 99% of the variance in inter-cluster Euclidean distances between data points and individual clusters was explained. The reader is directed to Telgarsky and Vattani 98 for a detailed description of the K-means clustering algorithm.
A wide range of regression algorithms (linear, tree-based, support vector, Gaussian process-based, and ensembles) was trained and validated using a tenfold cross-validation strategy on 85% of the entire dataset (training dataset). The tenfold crossvalidation strategy randomly partitioned the data into ten segments of roughly equal size, and systematically trained data on nine of the ten partitioned folds while using the remaining tenth segment to estimate model performance. The process was repeated until all ten of the segments had been used to evaluate model performance. The 10-fold cross-validation strategy was run 1000 times, with each run consisting of a newly partitioned training (85%) and testing dataset (15%). Model performance was evaluated based on the root-meansquare error (RMSE) and coefficient of determination (r 2 ) averaged over the 1000 runs of the 10-fold cross-validation where ∑ n i¼1 ðx i À y i Þ 2 represents the sum of squared errors between the predicted (x i ) and observed NEE (y i ) and y ¼ represents the mean observed NEE flux over the entire data record.
Model testing using the remaining 15% of the dataset (testing dataset) was evaluated using the mean r 2 and RMSE values over 1000 runs, with each run consisting of a uniquely partitioned testing and training dataset. Comparing the computed RMSE and r 2 values for all trained models, an epsilon insensitive support vector machine regression (ε-SVM) model was selected because it had the lowest RMSE and highest r 2 values. An additional advantage of a ε-SVM model is its relatively simple structure and implementation. The performance of other regression algorithms used in this analysis for comparison is provided in Supplementary  Fig. S3. Support vector machines comprise a set of Kernel-based ML algorithms for classification and regression analyses. SVMs generally operate through the nonlinear mapping of input vectors into a higher dimensional space and solving optimization problems in this space 99 . In the case of classification, an optimization problem is solved to correctly classify the greatest number of observations through maximizing the margin separating the hyperplane, which is controlled by the presence of support vectors 100 . For regression analyses, this concept is extended to include a region surrounding the plane referred to as the ε-insensitive region. The goal of the SVM regression is to solve a convex optimization problem that aims to minimize a defined ε-insensitive loss function while maximizing the flatness of the ε-insensitive region which contains many of the training observations. The reader is directed to Cortes and Vapnik 99 and Awad et al. 100 for a detailed mathematical discussion of SVMs.
With few model parameters, SVMs have been shown to mimic the performance of the best artificial neural networks 101 that have been used extensively in modeling CO 2 fluxes 37,38 . Moreover, the computational complexity of SVM regressions does not depend on the dimensionality of the input space making for an attractive choice for large datasets with many variables 100 . The trained ε-SVM uses a quadratic kernel, with hyperparameters determined based on the interquartile range of the observed NGS-NEE. Training data used in the creation of the ε-SVM model are provided in Supplementary Data 1.
Sensitivity analysis. A moment-independent GSA was conducted based on PAWN 102 . Unlike other density-based sensitivity analyses, PAWN uses empirically computed cumulative density functions (CDFs) rather than probability density functions to calculate sensitivity indices (SI). Given the non-normal distribution of the observed NGS-NEE ( Supplementary Fig. S1), the variance of the NGS-NEE distribution would not adequately capture the model uncertainty 103 . Hence, PAWN offers a more natural choice for the calculation of SIs.
We implemented PAWN using the SAFE MATLAB Toolbox, an open-source toolbox developed by F. Pianosi, F. Sarrazin, and T. Wagener (https://www.safetoolbox.info/). Modifications to this toolbox were made to handle the nonparametric distributions unique to the underlying distribution of the variables used in this study ( Supplementary Fig. S1). SI was calculated as the distance between the conditional and unconditional CDFs. Here, this distance was approximated using the Kolmogorov-Smirnov statistic. Unconditional CDFs were calculated by varying all input variables simultaneously, while conditional CDFs were obtained through varying all input variables except one, which was allowed to vary within a specified range of values (conditioning intervals). Supplementary Figure S4 provides a visual representation of the empirically derived CDFs through a regional sensitivity analysis.
The GSA used n = 8000 samples obtained via Latin Hypercube Sampling in five conditioning intervals, yielding a 2000 bootstrapped 95% confidence interval. A dummy variable in the GSA served as a control against which the SIs of other variables were compared. SI values range from 0 to 1, with zero representing null effects (no sensitivity) and 1 representing the largest sensitivity to modeled NEE. For a more detailed description of PAWN, the reader is referred to Zadeh et al. 104 , Pianosi and Wagener 102,105 , and to Noacco et al. 106 for examples of the use of the SAFE toolbox.
Climate projections. Three future climate scenarios were considered: RCP2.6, RCP4.5, and RCP8.5. These scenarios describe the climate trajectories associated with stringent, moderate, and high degrees of radiative forcing, respectively 107 . Values of the most sensitive model variables (soil temperature and moisture, air temperature, wind speed and direction, upwelling PPFD, and net radiation above canopy; see "Sensitivity analysis") were altered according to their predicted changes under each RCP. These predicted changes were obtained from either previously conducted studies or outputs from global climate models and are described hereafter.
Winter air temperature projections for the Ottawa region under each RCP were obtained from statistically downscaled (10 km spatial resolution) predictions from 24 global climate models participating in the fifth phase of the Coupled Model Intercomparison Project (CMIP5). The obtained data were previously statistically downscaled using the Bias Correction/ Constructed Analogs with Quantile mapping version 2 (BCCAQv2) algorithm 108,109 . Changes in soil temperatures were assumed to be 1°C lower than the projected changes in air temperature based on the work of Zhang et al. 110 . and Wisser et al. 111 . Winter wind speeds and directions were kept constant for all RCPs as predictions by Jeong and Sushama 112 and McInnes et al. 113 showed little to no increase in wind speeds in the Ottawa region, in contrast to the predicted increases in regions at higher latitudes in Canada.
Predicted changes for upwelling PPFD and soil moisture were obtained as outputs from the Second Generation Canadian Earth System Model (CanESM2). Predicted reductions in upwelling shortwave radiation were used as a proxy for reductions in upwelling PPFD, given that photosynthetically active radiation falls partly in the shortwave radiation spectrum. It is recognized that changes in upwelling radiation are difficult to predict due to complex interactions between changing snow-cover fractions 9 , sea ice-snow albedos 9,114 , and aerosol concentrations 115 .
Changes to net radiation were applied linearly between 2021 and 2100. The net change by the year 2100 corresponded directly to the radiative forcing associated with each RCP (+2.6, +4.5, +8.5 W/m 2 for RCP2.6, 4.5, and 8.5, respectively). Mean NGS-NEE CO 2 effluxes were then calculated with the ML model using the predicted temporal trends of the seven predictor variables for the 2021-2100 period. Net changes to the model predictor variables under each RCP by the end of the 21st century are presented in Table 3.

Code availability
Code the GSA, variable selection methodology, development of the model, and future predictions were developed using MATLAB version R2019b and is available on request.
Received: 9 January 2021; Accepted: 13 May 2021; Table 3 Climate projections: predicted changes in input variables by the end of the 21st century, according to the three radiative forcing scenarios, representative concentration pathways (RCPs) 2.6, 4.5, and 8.5.