Skillful statistical models to predict seasonal wind speed and solar radiation in a Yangtze River estuary case study

This paper illustrates the potential for seasonal prediction of wind and solar energy resources through a case study in the Yangtze River estuary. Sea surface temperature and geopotential height-based climate predictors, each with high correlation to ensuing seasonal wind speed and solar radiation at the Baoshan weather observing station, are identified and used to build statistical models to predict seasonal wind speed and solar radiation. Leave-one-out-cross-validation is applied to verify the predictive skill of the best performing candidate model for each season. We find that predictive skill is highest for both wind speed and solar radiation during winter, and lowest during summer. Specifically, we find the most skill when using climate information from the July-September season to predict wind speed or solar radiation during the subsequent November-January season. The ability to predict wind and solar energy availability in the upcoming season can help energy system planners and operators anticipate seasonal surpluses or shortfalls and take precautionary actions.

Renewable energy resources, such as wind and solar power, play a crucial role in reducing the use of fossil fuels and ultimately lowering carbon dioxide emissions, mitigating anthropogenic global warming and enhancing drought resilience 1,2 . The desire to integrate renewable energy sources into energy systems exists in many regions of the world. One challenge that remains, however, is the substantial variability of wind and solar power availability [3][4][5] . As a consequence, accurately predicting variations in the availability of wind and solar resources is essential for energy system planning and operation.
A number of methods and models for forecasting wind and solar energy resources have been proposed. These methods can be mainly divided into two categories: statistical methods and physical models. Statistical models utilize historical time series data to estimate a statistical relationship between relevant explanatory variables and wind and solar energy availability. Regression and time series models, such as autoregressive (AR), moving average (MA) and autoregressive integrated moving average models (ARIMA) [6][7][8][9] , are often utilized. Physical models attempt to simulate the underlying physics associated with wind and solar energy, often within a numerical weather prediction (NWP) framework [10][11][12] . The efficacy of statistical and physical models depends on the desired prediction time horizon [13][14][15][16][17] . Consequently, a combination physical-statistical model has the potential to incorporate the strengths of each type of model and improve skill in predicting the variability of wind and solar energy resources at different time horizons.
There have been several prior studies regarding the influence of climate teleconnections on wind and solar energy resources. For instance, Ravestein, et al. 18 discussed the impact of both climate change and climate variability on the supply of renewable energy sources in Europe. Berg, et al. 19 and Mohammadi and Goudarzi 20 investigated the sensitivity of wind speed and solar radiation in California to the El Nino Southern Oscillation (ENSO). Chen, et al. 21 and Sherman, et al. 22 summarized the decreasing potential and interannual variability of wind power in China. They showed that the information of the Pacific Decadal Oscillation (PDO), the Arctic Oscillation (AO) and the ENSO could be exploited to improve energy system management. Guo, et al. 23 found that a weakening lower-tropospheric pressure-gradient between the land and sea in coastal China has been a primary cause of the observed decreasing trend in near-surface wind speed in that region. Most prior studies focus on characterizing the relationship between teleconnection indices and renewable energy variables and do not evaluate the efficacy of a model that uses climate information to predict wind and solar energy resources.
The goal of this paper is to present a predictive modeling framework for wind and solar energy resources at the seasonal timescale. In the remainder of the paper, we illustrate climate predictor identification, model selection, and prediction performance assessment for the case study region of the Yangtze River estuary. The Yangtze River estuary is the largest economic zone in China and rich in renewable energy resources [24][25][26] . A skillful prediction of seasonal wind and solar resources for this region can help facilitate the management and operation of the electricity system and can ultimately aid in the effort to integrate more wind and solar energy sources into the power system.

Data Description
Wind speed and solar radiation data. Daily wind speed data from 1959 to 2017 and solar radiation data from 1958 to 2016 at Baoshan weather observing station (121.45°E, 31.4°N, Fig. 1) in the Yangtze River estuary were selected for this study. The wind speed was measured at 10 m above the ground. The data were provided by the National Climate Center, China Meteorological Administration. A three-month moving average of wind speed and solar radiation was calculated using the corresponding daily data. A preliminary analysis of the seasonality of solar radiation shows that the weakest and the strongest solar radiation occurs in the November-December-January (NDJ) and May-June-July-August (MJJA) seasons, respectively. Consequently, we define the four seasons as February-March-April (FMA), May-June-July (MJJ), August-September-October (ASO) and November-December-January (NDJ).
A trend analysis of the seasonal wind speed and solar radiation was conducted using a modified Mann-Kendall test 27 that accounts for the reduced effective sample size resulting from serial correlation in the data. The results show that the seasonal wind speed for all four seasons and seasonal solar radiation for the MJJ and NDJ seasons have significant decreasing trends with p-values less than 0.05. Seasonal solar radiation during the FMA and ASO seasons do not exhibit statistically significant monotonic trends.
The modest negative trends in solar radiation may be the result of accelerated industrialization and burning of coal, which leads to increased near-surface aerosols 28 . In turn, these near-surface aerosols can lead to reduced incoming solar radiation at the surface due to a) their scattering and absorbing properties, and b) their promotion of reflectivity by clouds since aerosols can act as cloud condensation nuclei 29 .
Explaining the pronounced negative trends in wind speed is more complex because the mechanisms differ by season. The pronounced negative trends have been shown to be widespread throughout China in the historical record 23,30,31 and a discussion of the possible mechanisms is included in Guo, et al. 23 . As mentioned above, a trend in the land-sea pressure gradient is one such proposed mechanism. www.nature.com/scientificreports www.nature.com/scientificreports/ Climate teleconnection and predictor identification. Several studies have reported strong teleconnections between precipitation in the study area and sea surface temperature (SST) anomalies in the Pacific and north Indian Oceans [32][33][34] . We can infer therefore that SST anomalies in particular ocean regions may affect cloud coverage in our study area. In addition, geopotential height (a large-scale climate field index related to surface pressure) is also related to wind patterns and cloud coverage. Therefore, to identify climate variables with the potential to predict season-ahead wind speed and solar radiation, we consider a) SST fields obtained from the Hadley Center SST dataset on a 1° × 1° grid 35 and b) geopotential height at 850-hPa (GPH850) obtained from the National Center for Environmental Prediction-National Center for Atmospheric Research reanalysis data on a 2.5° × 2.5° grid 36 . We investigate the correlation between 3-month moving averaged SST and GPH850 with linearly de-trended wind speed and solar radiation for each season. For each season, we evaluate several correlations where the SST and GPH850 fields are leading the seasonal wind speed and solar radiation at the study site by between 6 and 0 months. Tables 1 and 2 summarize the regions with high correlation for all four seasons, in which the SST regions and GPH850 regions are separately listed. The selected regions of SST and GPH850 have significant and persistent correlations (i.e. the correlations persist when you evaluate concurrent and lagging correlations) with seasonal wind speeds and solar radiation at the study site. For example, a significant correlation between FMA wind speed and the SSTs in the region defined by 10°N~20°N and 135°E~235°E is identified not only for the concurrent season (i.e. wind speed of FMA and average SST of FMA), but also for prior season SSTs (i.e. wind speed of FMA and average SST of NDJ, OND, SON, ASO, etc.). Hence, using SST and GPH850 information can provide a lead time for predicting seasonal wind speed and solar radiation. We use 0.3 as a correlation threshold with which to identify potential predictor regions. This threshold of 0.3 was used to obtain a compromise between having sufficient lead time and obtaining skillful predictions. Figures 2a,b and 3a,b show the Pearson correlation between the observed wind speed/solar radiation of NDJ and SST/GPH850 of JAS, respectively. Two regions with high correlation were identified in each figure. The SST/GPH850 within the identified regions may provide predictive information regarding the following season's wind speed and solar radiation. A one-month lag between the climate information and predictand indicates that the wind speed/solar radiation of NDJ may be predictable using data that is available in the beginning of October. The SST and GPH850 regions that are correlated with wind speed and solar radiation at the study site over several months are summarized in Tables 1 and 2  www.nature.com/scientificreports www.nature.com/scientificreports/ principle components (PCs) from the corresponding SST and GPH850 regions. We evaluated all components that explained more than 5% of variation but found that only PC1 of the SST/GPH fields in the regions of interest were useful. In each case PC1 explains more than 70% of the SST/GPH850 regional variance and has a correlation with the area-averaged SST/GPH850 with absolute value greater than 0.8. In other words, the information contained in PC1 is very similar to the information contained in the area-averaged SST/GPH850. We found, however, that using the leading component (rather than the area-averaged value) led to more skillful models.
We rely on cross-validation to limit the risk that our selected climate predictors are spuriously related to wind speed and solar radiation at our study site. As such, we do not focus on identifying the full causal pathway by which the identified climate predictors relate to our station-based wind speed and solar radiation. Having said that, we offer speculation on plausible physical connections between the predictors identified in Figs. 2 and 3 and wind speed and solar radiation at our study site in the Supplemental Information.
We also consider some well-known large-scale low-frequency climate variables such as SST ENSO indices (including Nino 4, Nino 3.4, Nino 3), the Pacific Decadal Oscillation (PDO), the North Atlantic Oscillation (NAO), the Southern Oscillation Index (SOI), and the Arctic Oscillation Index (AO) as candidate predictors of seasonal wind speed and solar radiation. For collinear indices, such as Nino 4, Nino 3.4, Nino 3, the SOI, and the PDO, only the index with the highest correlation to the wind and solar variables is used in the next step. Tables 3 and 4 summarize the selected predictors and their correlation with the study site wind speed and solar radiation time series for different seasons. These predictors contain climate information that may inform predictand variability at multiple timescales from interannual to interdecadal.

Methodology
Linear temporal regression models to predict wind speed and solar radiation. Given the significant correlations of wind speed and solar radiation at the study site with the selected climate predictors as shown in Tables 3 and 4, we consider three linear temporal regression (LTR) models. Each season is studied independently.   www.nature.com/scientificreports www.nature.com/scientificreports/ Model selection. The adjusted coefficient of determination (Adj.R 2 ) is used to assess the goodness-of-fit of each model. The Akaike Information Criterion (AIC) 37 is also used to compare different models and to select the best combination of climate variable predictors. The AIC is a goodness-of-fit estimator that accounts for model complexity in an effort to avoid overfitting (more precisely the AIC is two times the negative log likelihood plus two times the number of predictors in the model). Models with smaller AIC values and larger Adj.R 2 values are preferred. evaluation of prediction performance. Leave-one-out-cross-validation (LOOCV) is applied to estimate the predictive efficacy of the selected models. The procedure of LOOCV is the following. 1) set aside one year of data from the observational record to be used as testing data, 2) fit the model on the remaining (n-1) years of data, 3) predict the left-out year based on the model fit and the left-out year's climate predictor values. This process is repeated for every year. Then, we select the optimal model among all candidate models by calculating the mean squared error (MSE) values under LOOCV and under full sample estimation, respectively. The mean squared error (MSE) is defined in Eq. (2).
In Eq. (2), O t and P t are the observed and predicted wind speed or solar radiation values for year t and n is the number of years over which the MSE is computed.

Results
Goodness-of-fit and model selection. The AIC and Adj.R 2 values for the seasonal wind speed and solar radiation candidate models are summarized in Tables 5 and 6. Models that include climate predictors perform better than those that only include temporal trends according to the AIC and Adj.R 2 values. When multiple noncollinear climate predictors are available, models with multiple climate predictors show the best performance. This suggests that the information carried by the climate predictors improves the explanation of variance in seasonal wind speed and solar radiation at the study site.   www.nature.com/scientificreports www.nature.com/scientificreports/ Effects of climate on seasonal wind and solar resources. In order to analyze the effects of the climate predictors on the seasonal wind speed and solar radiation variation, the Adj.R 2 values for the models that only include temporal trend terms or climate predictor terms are shown in Table 7. For all models of wind speed, the temporal trend explains the majority of variance, specifically between 62% and 80% depending on the season. The climate predictors explain 19%, 26%, and 40% of the remaining variance during FMA, ASO, NDJ, respectively. This indicates that although the negative temporal trend (illustrated in Fig. 4a-d) explains the majority of interannual wind speed variance, the climate predictors also explain a substantial portion of wind speed variation for three of the seasons. For solar radiation, no significant trends were identified in FMA and ASO, while weak decreasing trends were found in MJJ and NDJ. These trends account for 11% and 19% of the variance in the MJJ and NDJ seasons, respectively. Climate predictors explain 38%, 11%, 17%, 47% of the remaining interannual solar radiation variance during FMA, MJJ, ASO, NDJ, respectively. The primary takeaway is that the inclusion of climate predictors in the seasonal wind speed and solar radiation models generally leads to substantial increases in predictand variance being explained (Table 7).   Table 7. Summary of the fitting performance of the best models for wind speed and solar radiation only using the linear temporal trend or climate variables as predictors. The blank cells indicate that the best model for that season and variable did not include a temporal trend term (middle column) or climate predictor terms (right column). The Adj.R 2 values for the models that only include climate predictor terms are calculated using the data after removing the temporal trend of observed wind speed and solar radiation.
Scientific RepoRtS | (2020) 10:8597 | https://doi.org/10.1038/s41598-020-65281-w www.nature.com/scientificreports www.nature.com/scientificreports/ Validation of the prediction models. LOOCV is applied to assess the predictive skill of the seasonal wind speed and solar radiation models and the MSE values for all the models are shown in Tables 5 and 6. The models that include climate predictors generally have smaller MSE values compared to the models that do not include climate predictors. This increased skill under cross-validation provides evidence that the use of climate information increases the predictive capacity of the models. Figures 4a-d and 5a-d show the time series of observed seasonal wind speed and solar radiation, model fits, and the cross-validated predictions for the models with the lowest MSE for each season. The model fits and cross-validation predictions are generally very similar. This indicates a low risk of overfitting. For both wind speed and solar radiation, the FMA and NDJ models have higher cross-validated R 2 values than the MJJ and ASO models. The NDJ season model has the best cross-validated R 2 of all solar radiation models, which is in part because the linear trend is most pronounced in that season.   Figure 6a-d show the comparison between the observed and predicted ASO, NDJ wind speed, and FMA, NDJ solar radiation. Consistent with the LOOCV results, the models that include climate predictors outperform the models that do not include climate predictors on the basis of MSE over the validation period. It is notable that the climate predictors appear to capture changes in the low-frequency signals that occur between the calibration and validation periods and that are not captured by the linear trend term (see ASO and NDJ wind speeds in Fig. 6). While the climate predictors do improve the model performance during the validation period, the predictions still deviate substantially from the observations in some respects. For example, in general the variance of the predictions is smaller than the variance of the observations (e.g. FMA solar radiation), and some large observed anomalies are not explained by our models (e.g. see years 2008-2009 for ASO wind speed and years 2002-2005 for FMA solar radiation in Fig. 6). It should also be noted that the climate predictor identification phase implicitly utilized information from both the LOOCV left-out years and from the validation period since the predictors were selected based on the full dataset. However, our method yields the same sets of predictors whether we use the full dataset or the training period data to select model predictors.

Summary and conclusions
The main objective of this study was to develop a skillful model based on climate information to predict seasonal wind speed and solar radiation in the Yangtze River estuary. First, we identified SST regions, GPH850 regions, and some standard climate indices that have a strong correlation with ensuing seasonal wind speed and solar radiation at the Baoshan observing station. Second, we developed predictive models based on the identified climate predictors. Three linear regression models were considered in this study: a time-varying model, a univariate time-varying climate-informed model, and a multi-variable time-varying climate-informed model. The AIC and Adj.R 2 were applied to select the best-fitting model for each of the four seasons. Third, we conducted a cross-validation analysis on the selected models to check that the models were not overfitted and to evaluate how much of the interannual wind speed and solar radiation variability could be predicted by the models. The results demonstrate that both the newly derived large-scale SST and GPH850 indices as well as pre-existing climate indices can explain a portion of the impact of large-scale climate circulations on the variability of local wind and solar energy availability. The models presented in this paper illustrate the ability to skillfully predict wind speed or solar radiation at the seasonal timescale.
Seasonal prediction of wind speed and solar radiation has the potential to help facilitate integration of wind and solar electricity generation into existing electricity grids by allowing grid operators to better plan for potential surpluses or shortfalls in upcoming seasonal generation. For example, knowing in advance that there is increased risk of low wind power generation for an upcoming season could be useful for several entities. This information could 1) allow turbine owners to schedule maintenance during these seasons so as to limit the costs associated with such maintenance, or 2) allow energy managers to better anticipate the need for back-up fuels, such as natural gas, during the upcoming season.
Efforts to investigate the predictability of wind and solar resources are becoming increasingly salient as many regions worldwide increase their dependence on these renewable resources. As such, further research into the predictability of wind and solar power availability at timescales from sub-daily to decadal is warranted. While the climate variables that best predict seasonal wind speed and solar radiation will vary from region to region, the modeling framework of this study can be expanded to other regions in China as well as other countries. Figure 6. Comparison between the observed ASO, NDJ wind speed (a,b), FMA, NDJ solar radiation (c,d) (black solid circles and lines) and model fits and predictions based on the selected models for the calibration (1959-1998 for wind speed, 1958-1997 for solar radiation) and validation (1999-2017 for wind speed, 1998-2016 for solar radiation) periods (green and blue circles and lines). Green circles and lines represent the models incorporating climate predictors, and blue circles and lines represent either the time series mean or models that only include the linear temporal trend term.