Temporal Convolutional Networks for the Advance Prediction of ENSO.

El Niño-Southern Oscillation (ENSO), which is one of the main drivers of Earth's inter-annual climate variability, often causes a wide range of climate anomalies, and the advance prediction of ENSO is always an important and challenging scientific issue. Since a unified and complete ENSO theory has yet to be established, people often use related indicators, such as the Niño 3.4 index and southern oscillation index (SOI), to predict the development trends of ENSO through appropriate numerical simulation models. However, because the ENSO phenomenon is a highly complex and dynamic model and the Niño 3.4 index and SOI mix many low- and high-frequency components, the prediction accuracy of current popular numerical prediction methods is not high. Therefore, this paper proposed the ensemble empirical mode decomposition-temporal convolutional network (EEMD-TCN) hybrid approach, which decomposes the highly variable Niño 3.4 index and SOI into relatively flat subcomponents and then uses the TCN model to predict each subcomponent in advance, finally combining the sub-prediction results to obtain the final ENSO prediction results. Niño 3.4 index and SOI reanalysis data from 1871 to 1973 were used for model training, and the data for 1984-2019 were predicted 1 month, 3 months, 6 months, and 12 months in advance. The results show that the accuracy of the 1-month-lead Niño 3.4 index prediction was the highest, the 12-month-lead SOI prediction was the slowest, and the correlation coefficient between the worst SOI prediction result and the actual value reached 0.6406. Furthermore, the overall prediction accuracy on the Niño 3.4 index was better than that on the SOI, which may have occurred because the SOI contains too many high-frequency components, making prediction difficult. The results of comparative experiments with the TCN, LSTM, and EEMD-LSTM methods showed that the EEMD-TCN provides the best overall prediction of both the Niño 3.4 index and SOI in the 1-, 3-, 6-, and 12-month-lead predictions among all the methods considered. This result means that the TCN approach performs well in the advance prediction of ENSO and will be of great guiding significance in studying it.

reveal the underlying complex climate change characteristics by studying the change law of the index. For example, the Niño . 3 4 index and ONI track the SST anomalies in the east-central tropical Pacific between  5 S −  5 N and  170 W−  120 W, namely, Niño . 3 4 region, wind index measures the movement of air flow in the upper and lower branches of the Pacific Walker circulation, and the OLR index indicates the extent of convection across the tropical Pacific. However, ENSO is a very complex phenomenon, it is difficult to use a unified index to characterize the ENSO phenomenon in different parts of the world. In general, the Niño . 3 4 index and the ONI are the most commonly used indexes to define El Niño and La Niña events in the sea, and the SOI is the oldest indicator of the ENSO state in the atmosphere 11 , which constitute the two important and highly-related components of ENSO. In addition, because the ONI is the three-month running mean of SST anomalies in the Niño .
3 4 region 12 , that is, the ONI is the three-month-moving-average of the Niño . 3 4 index. Hence, in this paper, we choose the Niño . 3 4 index as an indicator of ENSO events in the ocean and the SOI as a measure of ENSO events in the atmosphere.
For the numerical simulation models used for ENSO prediction, three general approaches exist: statistics-based methods, ML-based methods, and a hybrid approach, i.e., the statistics-ML method.
• The statistical ENSO prediction methods leverage the collation, induction, and analysis of historical ENSO indexes to realize the analysis and prediction of ENSO phenomena. Typical methods include the Holt-Winters (HW) method and the autoregressive integrated moving average (ARIMA) method. The HW method is a statistical short-term method 13 that has been used to forecast time series with seasonal patterns and repetitive forms and uses a technique called "exponential smoothing" that reduces fluctuations in the time-series data, thus providing a clearer view of their fundamentals 14 . In 2014, Mike and Ray used the HW method to make 1-step-ahead and 12-step-ahead forecasts of the Niño region 3 SST index from January 1933 to December 2012. The final predicted out-of-sample root mean square errors of the HW model were 0.303 and 1.309, respectively. Hence, they introduced an improved HW model called the dynamic seasonality model (DSM) to alleviate the shortcomings of the HW method unsuitable for periodically stationary time series 15 .
The ARIMA aims to describe the autocorrelations in time-series data. In 2011, Matthieu et al. developed a time-series analysis method using the ARIMA to investigate temporal correlations between the monthly Plasmodium falciparum case numbers and ENSO as measured by the SOI at the Cayenne General Hospital between 1996 and 2009. The results showed a positive influence of El Niño at a lag of three months on Plasmodium falciparum cases (p < 0.001), and the incorporation of SOI data in the ARIMA model reduced the Akaike information criterion (AIC) 16 32 , to predict the North Atlantic oscillation (NAO) index. The experimental results showed that EEMD-ConvLSTM not only had the highest reliability according to the evaluation metrics but could also better capture the variation trends of the NAO index data 33 . However, the prediction results of these methods often depend largely on the statistical decomposition model. If the statistical decomposition model can separate the components of the time-series data well and then select an excellent time-series prediction algorithm, it will deliver better prediction results.
However, the ENSO phenomenon is a highly complex and dynamic model involving different aspects of the ocean and the atmosphere over the tropical Pacific 34 , and the variation trends over time are non-linear. The statistical method tends to have a poor fitting effect on non-linear data sets and is not ideal for complex pattern recognition and knowledge discovery. The ML-based methods, especially those based on deep-level networks, tend to be complex and computationally time-consuming and are not very predictive of very-long-term-sequence ENSO indexes. In addition, for the long time-series Niño .
3 4 index and SOI data, they not only have the characteristics of approximately periodic interannual changes but also a large amount of high-frequency random noise due to seasonal changes, which seriously reduces the numerical simulation models' forecasting ability. Hence, it is still difficult to predict ENSO event at lead times of more than one year 5 . Therefore, choosing a novel time-series analysis model that can accurately predict the ENSO state at lead times of more than one year will be of great significance.
The temporal convolutional network (TCN), as a variant of the convolutional neural network (CNN), employs casual convolutions and dilations; hence, it is suitable for sequential data with temporality and large receptive fields. In addition, the CNN has been reported to predict the ENSO phenomenon and achieve good results 5 . However, the inherent shortcomings of the CNN, including the fixed-size input vector and inconsistent input and output sizes, limit its application in time-series prediction. Furthermore, the TCN has a simple network structure and outperforms canonical recurrent networks, such as the recurrent neural network (RNN) and LSTM networks, in terms of the accuracy and efficiency of time-series data analysis. In addition, the ensemble empirical mode decomposition (EEMD) not only can decompose high-frequency time series into some adaptive orthogonal components, called intrinsic mode functions (IMFs), but also has the advantages of noise-assistance and overcoming the drawbacks of mode mixing in conventional empirical mode decomposition (EMD) 35 . EEMD can be used to decompose the high-frequency time-series Niño .
3 4 index and SOI data into multiple adaptive orthogonal components to improve the prediction accuracy of the model. Therefore, this paper proposes the EEMD-TCN hybrid approach, which is used to decompose the highly variable ENSO indexes (Niño . 3 4 index and SOI) into relatively flat subcomponents, and then uses the TCN model to predict each subcomponent in advance, finally combining the sub-prediction results to obtain the final ENSO prediction results.

Data.
To verify the effectiveness of our proposed EEMD-TCN-based ENSO prediction approach, we selected the Niño .

Niño 3.4 index prediction results and discussion.
During the model-training process, we set the maximum number of training sessions to 7000 and compared the trend of the training loss with the training times. The result was that for any one of the IMFs, as the number of training sessions increased, the loss value gradually decreased and stabilized after 2000 training sessions. Therefore, we believed that the TCN model after 2000 training sessions was stable and could be used for Niño .
3 4 index prediction. In the model prediction process, we calculated the Pearson correlation coefficient (PCC) and the root mean square error (RMSE) between the resulting predicted and actual values 38 to evaluate the predictive performance of the model. The PCC is a measure of the linear correlation between the predicted value and the actual value, while the RMSE tries to measure their differences. The PCC and RMSE can well measure the homogeneous and heterogeneous relationship between the predicted value and the actual value and are one of the frequently used combinations to evaluate the predictive performance of a model. The formulas for calculating the PCC and RMSE are as follows: where m is the length of the time-series, p is the prediction results and p is its mean value, o represents the actual value and o represents its mean value.
We conducted 1-, 3-, 6-, and 12-month-lead Niño . (2) with the increase in advance prediction time, the RMSE gradually increased and the PCC gradually decreased overall, but they did not maintain strict linear variation characteristics; (3) the RMSE values of the 3-and 6-month-lead-predictions were significantly higher, while the PCC values were significantly lower, which was in line with the phenomenon of "spring forecast obstacles" in dynamic forecasting; (4) the curve obtained from the one-month-lead prediction had the highest degree of coincidence with the actual Niño .
3 4 index, the RMSE was 0.2337, and the corresponding PCC was 0.9658; (5) the curve obtained from the (2020) 10:8055 | https://doi.org/10.1038/s41598-020-65070-5 www.nature.com/scientificreports www.nature.com/scientificreports/ 12-month-lead prediction matched the actual curve the least, the RMSE was 0.5142, and the corresponding PCC was 0.8297; although the accuracy of the predicted results in the 12-month-lead case was relatively low, the same growth trend and turning point as for the actual curve could still be maintained; (6) with the increase in the advance prediction time, the forecasting deviation in some years slightly increased, such as in 1987, 1998, 2003, 2009, 2016, etc.; this may be due to the extreme El Niño and La Niña events in these years, which posed huge challenges to the predictive models; and (7) in terms of the Niño .
3 4 index advance prediction alone, the prediction accuracy of the EEMD-TCN method was similar to or slightly better than that of the previous research 5 , which reflected the effectiveness of the TCN in Niño .
Soi prediction results and discussion. Based on the Niño .
3 4 index prediction experience, we also set the maximum number of training sessions to 7000 in the SOI prediction experiment. Figures 3 and 4 show the final predicted and evaluated results. From Figs. 3 and 4, (1) the fitting degree between the predicted and actual SOI curves of the one-month-lead prediction was the highest, and the predicted SOI curve almost had the same growth trend and turning points as those of the actual curve; (2) the RMSE of the one-month-lead prediction was 0.6180, and its corresponding PCC was 0.8556; (3) with the increase in the advance prediction time, the PCC gradually decreased, but the RMSE curve showed a tortuous trend, which may be due to the influence of the phenomenon of "spring forecast obstacles" in dynamical forecasting; (4) the accuracy of the 12-month-lead prediction was the worst, with an RMSE value of 0.9403 and a PCC value of 0.6406; and (5) as a whole, the prediction accuracy of the EEMD-TCN approach on the  www.nature.com/scientificreports www.nature.com/scientificreports/ SOI was worse than that on the Niño . 3 4 index, which may be due to the significant difference between the two in the frequency domain (Fig. 5). The Niño .
3 4 index has significant interannual quasi-period peaks within 3 to 6 years. These peaks occur because the equatorial Kelvin waves and Ross Bay waves that determine the El Niño phenomenon in the ocean take approximately 2 years to complete adjustments in the Pacific Basin. The SOI, which is the response of the El Niño phenomenon to the atmosphere, has similar interannual quasi-period peaks to those of the Niño . 3 4 index. However, because the specific heat capacity of the atmosphere is small, the thermodynamic properties of the sea-level pressure field are affected not only by the underlying ocean but also by the high-frequency changes at the seasonal scale 39 . Therefore, the frequency spectrum of the SOI is significantly stronger than that of the Niño .
3 4 index. That is, the SOI data change more drastically than do the Niño . 3 4 index data, and this high-frequency random noise severely reduces the model's ability to predict the SOI data. Although the EEMD method was used to decompose high-frequency components into low-frequency subcomponents, it still cannot reach the prediction level of the Niño . 3 4 index. However, the PCC value of the worst forecast still exceeded 0.5, which strongly proves the effectiveness of the EEMD-TCN model in SOI advance prediction.

evaluation and discussion
To effectively evaluate the performance of our proposed "first EEMD and then TCN prediction" ENSO prediction approach, fully verifying the key role of EEMD decomposition in ENSO prediction, we carried out comparative experiments with the TCN, LSTM, and EEMD-LSTM. All comparative experimental data, as well as the training set and test set assignments, were the same as those of the EEMD-TCN-based ENSO prediction experiment.  3 4 index and SOI results predicted by the EEMD-LSTM approach. For the TCN, LSTM and EEMD-LSTM comparative experiments, the number of iterations was still set to 7000, consistent with the number of EEMD-TCN iterations. In addition, the RMSE and PCC values of the predicted results of each comparative experiment were also calculated, as shown in Fig. 12.
From the aforementioned comparison experiments, the following can be concluded.
• As shown in Fig. 6, compared with the "first EEMD and then TCN prediction" method, the prediction result of the Niño .  Fig. 7, the SOI prediction results obtained by using the TCN were worse than those obtained by the EEMD-TCN algorithm, regardless whether considering a one-month-lead forecast or a 12-month-lead forecast. This result may be due to the high-frequency components contained in the SOI time series, which lower the prediction accuracy of the pure TCN model, further confirming our hypothesis that the EEMD-TCN technique can effectively improve the prediction accuracy achieved on high-frequency variation time series.  3 4 index prediction is worse, which can also be verified from the RMSE and PCC values of the prediction results of the two (Fig. 12).
• On the basis of Figs. 9 and 11, for the overall SOI prediction accuracy, EEMD-LSTM is better than the LSTM model. However, as a whole, the SOI prediction accuracies obtained by the LSTM and EEMD-LSTM models are worse than the Niño .   www.nature.com/scientificreports www.nature.com/scientificreports/ EEMD-TCN and TCN models, which were determined on the basis of the high-frequency characteristics of the SOI data itself. • In Fig. 12, the longer the advance forecasting time, the more obvious the advantages of EEMD, no matter whether the TCN model or LSTM model is used. Therefore, the conclusion is that decomposing a time series that mixes low-and high-frequency components into sub-components containing a single frequency and making separate predictions can effectively improve the accuracy of long-term advance prediction. • As shown in Fig. 12, for the 1-, 3-, 6-, and 12-month-lead EEMD-TCN-based Niño .
3 4 index and SOI prediction, the RMSE values were the smallest and the PCC values were the highest compared with the corresponding results of the TCN, LSTM, and EEMD-LSTM methods. In addition, for the same test and validation data set and with the same number of training iterations, the LSTM model takes approximately 8 times longer than  www.nature.com/scientificreports www.nature.com/scientificreports/ the TCN model on an RTX 2080Ti GPU. In other words, the EEMD-TCN was the best model for ENSO advance prediction in terms of the prediction accuracy and efficiency.

conclusions and future Work
In view of the low accuracy of the current popular ENSO prediction methods, and considering Niño .
3 4 index and SOI reanalysis data containing many low-and high-frequency components, we proposed adopting a "first EEMD and then TCN prediction" hybrid approach, which decomposes the highly variable Niño .
3 4 index and SOI into relatively flat subcomponents and then uses the TCN model to predict each subcomponent in advance,  www.nature.com/scientificreports www.nature.com/scientificreports/ finally combining the sub-prediction results to obtain the final ENSO prediction results. The Niño .
3 4 index and SOI reanalysis data from 1871 to 1973 were used for model training, and the data for 1984-2019 were predicted 1 month, 3 months, 6 months, and 12 months in advance. The results show that for both the Niño .
3 4 index and SOI reanalysis data, the accuracy of the 1-month-lead prediction was the highest, and the 12-month-lead prediction was the slowest. Specifically, for Niño . 3 4 index advance prediction, the curve obtained by the one-month-lead prediction had the highest degree of coincidence with the actual value, the RMSE was 0.2337, and the corresponding PCC was 0.9658; the curve predicted by the 12-month-lead prediction matched the actual curve the least, the RMSE was 0.5142, and the corresponding PCC was 0.8297; and for the SOI advance prediction, the RMSE of the one-month-lead prediction was 0.6180, and its corresponding PCC was 0.8556, while the accuracy of the 12-month-lead prediction was the worst, with an RMSE value of 0.9403 and a PCC value of 0.6406. Furthermore, the overall prediction accuracy on the Niño . 3 4 index was better than that on the SOI, which may have occurred because the SOI contains too many high-frequency components, causing the prediction to be difficult. The results of comparative experiments with the TCN, LSTM, and EEMD-LSTM methods showed that the EEMD-TCN provided the best overall prediction of both the Niño .
3 4 index and SOI in 1-, 3-, 6-, and 12-month-lead predictions among all the methods considered. In particular, the TCN not only had higher prediction accuracy for the time-series data but also had a simpler network structure and higher operating efficiency than those of the popular LSTM network.
However, at those times when there are strong El Niño or La Niña phenomena, the prediction errors of both the Niño .
3 4 index and SOI were relatively large, and the errors became increasingly obvious as the advance prediction time increased. In addition, the proposed EEMD-TCN approach could not overcome the "spring forecast obstacles" in dynamic forecasting, and the correlation coefficients between the 3-and 6-month-lead predicted results and actual values were significantly reduced. Recently, several studies 40-42 used the physical-empirical model and/or statistical-dynamic model to improve the prediction of climate signals such as ENSO and Arctic Oscillation. The methods used in these studies consider both physical mechanism and numerical simulation, providing a new idea to improve the prediction accuracy of our EEMD-TCN model. Therefore, attempts to improve the prediction accuracy at time points with strong El Niño or La Niña phenomena, as well as overcoming the "spring forecast obstacles", will be carried out in future research.

tcn.
For the analysis of time-series data, the most commonly used neural network is the RNN 33 . RNN can employ the internal memories to process input time series, which is different from the traditional back-propagation (BP) neural network. However, the RNN model is generally not directly used for long-term memory calculation; thus, the improved RNN model known as LSTM was proposed 43 . LSTM can process sequences with thousands or even millions of time points, and has good processing ability even for long time series containing many high-and low-frequency components 44 . However, the latest research shows that the TCN, one of the members of the convolutional neural network (CNN) 45 family, shows better performance than LSTM in processing very long sequences of inputs 46 .
The typical characteristics of TCN includes: (1) It can take a sequence of any length and output it as a sequence of the same length with the input, just like using an RNN; and (2) the convolution is a causal convolution, which means that there is no information "leakage" from future to past. To reach the first goal, the TCN uses a one-dimensional, fully convolutional network (1D FCN) architecture 46 . That is, each hidden layer will be padded zero to maintain the same length with the input layer. To achieve the second point, the causal convolution, where an output at time t is convolved only with elements from time t and earlier in the previous layer, is adopted. In short, TCN is the sum of 1D FCN and causal convolutions.
FCN. Unlike the classic CNN, which uses a fully connected layer after the convolutional layer to obtain a fixed-length feature vector, the FCN uses the deconvolutional layers for the last convolutional layers 47 . That is, all the hidden layers in the neural network are convolutional layers, hence why it is named a "fully convolutional" network (Fig. 13).
The FCN can accept input images of any size, and its output has the same size as that of the input images thanks to the upsampling after the last convolutional layers, that is, deconvolution. Therefore, a prediction can be generated for each input pixel while preserving the spatial information in the original input image 46 . If the input images become 1D series data, then the FCN becomes a 1D FCN. Because the input and output of the FCN have the same size, the 1D FCN can produce an output with the same length as that of the input.
Causal convolutions. For sequence modelling, the main purpose is to predict some corresponding outputs The ordinary CNN is not suitable for addressing sequence problems because the input image size of a CNN must be fixed 48,49 ; thus, a causal convolution was used. However, it is very challenging to directly apply a simple causal convolution to deal with long time series problems, because it can only look back at a history with a linear size in the depth of the network. To eliminate this problem, the dilated convolution, which enables an exponentially large receptive field 50 , is employed. The same points between the simple causal convolution and the dilated convolution are that both of them have the same size of the convolutional kernel and the same number of parameters, and the difference is that the dilated convolution has a dilation rate parameter to indicate the size of the (2020) 10 , the dilated convolution operation F on elements s of the sequence is defined as follows.
where d denotes the dilation factor, d is the filter size, and − ⋅ d i s accounts for the direction of the past. Figure 14 illustrates the architectural elements in a TCN.
As shown in Fig. 14, when d = 1, the dilated convolution becomes a simple convolution; if we choose larger filter sizes k and increase the dilation factor d, the receptive field of the TCN can be increased. Therefore, we can use these methods to address long-sequence problems.  www.nature.com/scientificreports www.nature.com/scientificreports/ Residual connections. In addition, as the length of the time series increases, the TCN receptive field widens, resulting in the number of network layers and the number of filters per layer increasing, as the TCN receptive field depends on the network depth n, filter size k, and dilation factor d.
However, the major issue for very deep networks is exploding and/or vanishing gradients; the TCN model uses a generic residual module instead of a convolutional layer to avoid these problems. Figure 15 shows the residual block for a TCN.
In Fig. 15, the TCN model has two layers, i.e., a dilated causal convolution and non-linearity (ReLU), as well as weight normalization in between. In addition, a spatial dropout was added after each dilated convolution for regularization, and an additional × 1 1 convolution was adopted to ensure that the element-wise addition ⊕ received tensors of the same shape to resolve the difference in input and output widths. eeMD. EEMD is an improved version of EMD that effectively overcomes the drawbacks of mode mixing in conventional EMD. Its principle is to add the normal distribution of white noise to the original signal subjected to EMD decomposition, then to use the spectral characteristics of the white noise uniform distribution to offset the specific spectrum loss of the original signal, and finally to eliminate the modal aliasing inherent in EMD 35 . After EEMD, the original high-volatility time series can be divided into some adaptive orthogonal components, called IMFs, which cannot maintain the original characteristics but greatly reduce the annualized volatility. Figure 16 shows the original Niño .
3 4 index and SOI series, as well as their EEMD-decomposed components. It can be seen that after EEMD, each IMF component of the Niño .
3 4 index and SOI series contains only one frequency component, which can effectively improve the prediction accuracy of the model.

EEMD-TCN-based ENSO index series prediction. After the original Niño .
3 4 index and SOI series were decomposed into multiple IMFs components, each single component could be predicted using the TCN model and combined to obtain the final prediction result. For the TCN-based single component prediction, the core problem is to determine the network parameters, including the dilation factor d, the filter size k, and the minimum network depth n 46 , based on data characteristics to obtain accurate time-series prediction results.
Dilation factor. The dilation factor d generally increases as the depth n of the network increases. Their relationship can be expressed by the following formula 52 : i i where i represents the i-th layer and n represents the total number of dilated causal convolutional layers. www.nature.com/scientificreports www.nature.com/scientificreports/ Filter size. The filter size, also known as the convolutional kernel size, varies with the dilation factor d, and their relationship can be expressed by the following formula 53 : where i represents the i-th layer and n represents the total number of dilated causal convolutional layers. In general, the initial size of the filter is = k 2 1 by default, but it can be set to other values depending on the data situation.
Minimum network depth. In the TCN model, the minimum depth of the TCN directly affects the receptive field, which is determined by the following formula: where nb stacks of residual block _ _ _ _ is the number of stacks of residual blocks to use, which is set to 1 by default, k 1 is the initial size of the filter, and d n is the dilation factor of the n-th dilated causal convolutional layer; n represents the total number of dilated causal convolutional layers. Therefore, it is necessary to comprehensively consider the length of the input sequence and the size of the receptive field to obtain a reasonable minimum network depth. 3 4 index, and the picture on the right is the SOI.