Hybrid systems using residual modeling for sea surface temperature forecasting

The sea surface temperature (SST) is an environmental indicator closely related to climate, weather, and atmospheric events worldwide. Its forecasting is essential for supporting the decision of governments and environmental organizations. Literature has shown that single machine learning (ML) models are generally more accurate than traditional statistical models for SST time series modeling. However, the parameters tuning of these ML models is a challenging task, mainly when complex phenomena, such as SST forecasting, are addressed. Issues related to misspecification, overfitting, or underfitting of the ML models can lead to underperforming forecasts. This work proposes using hybrid systems (HS) that combine (ML) models using residual forecasting as an alternative to enhance the performance of SST forecasting. In this context, two types of combinations are evaluated using two ML models: support vector regression (SVR) and long short-term memory (LSTM). The experimental evaluation was performed on three datasets from different regions of the Atlantic Ocean using three well-known measures: mean square error (MSE), mean absolute percentage error (MAPE), and mean absolute error (MAE). The best HS based on SVR improved the MSE value for each analyzed series by \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$82.26\%$$\end{document}82.26%, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$98.93\%$$\end{document}98.93%, and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$65.03\%$$\end{document}65.03% compared to its respective single model. The HS employing the LSTM improved \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$92.15\%$$\end{document}92.15%, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$98.69\%$$\end{document}98.69%, and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$32.41\%$$\end{document}32.41% concerning the single LSTM model. Compared to literature approaches, at least one version of HS attained higher accuracy than statistical and ML models in all study cases. In particular, the nonlinear combination of the ML models obtained the best performance among the proposed HS versions.

• Proposal of a hybrid system methodology to improve the accuracy of ML models in the SST forecasting; • The performance evaluation of two hybrid systems in the 1-day ahead forecasting SST using three wellknown measures: mean square error (MSE), mean absolute percentage error (MAPE) and mean absolute error (MAE); • The development of two versions of each analyzed hybrid system using well-known ML models: SVR and LSTM; • The hybrid systems employing the SVR achieved on average a percentage gain compared to its respective single model of 80.27%, 61.72%, and 60.21% for MSE, MAPE, and MAE, respectively; • The hybrid systems using the LSTM attained an average percentage gain concerning its respective single model of 73.16%, 57.90%, and 56.98% for MSE, MAPE, and MAE, respectively; • The results show that, in general, the developed hybrid systems overcame literature statistical and ML models in the SST forecasting context.
The remainder of the paper is organized as follows. "Related works" section shows the related works of hybrid systems that deal with residual series modeling. This section describes the evaluated hybrid systems in the SST forecasting: Perturbative approach ("The perturbative approach" section) and NoLiC ("The NoLiC method" section). "PIRATA data set" section presents the data set extracted from the PIRATA project website. "Experimental protocol" section shows the experimental protocol used in this work. In "Simulations and experimental results" section, the results and discussions are presented. Finally, "Discussion" section shows the concluding remarks and suggestions for future works.

Related works
Combining models is one of the most common alternatives to enhance the accuracy of forecasting systems 16,17,[28][29][30][31] . In the literature, there are two well-established approaches: ensembles 32 and hybrid systems that perform residuals modeling 16,17 . Both theoretical and empirical results indicate that the latter approach is an interesting strategy to increase the robustness and accuracy of the forecasts 16,17,[28][29][30][31] .
The general architecture of a hybrid system that performs the residuals modeling can be divided into three main steps: time series forecasting, error series forecasting, and the combination of the two first steps. Equation (1) shows a general view of this architecture, where the final output of the hybrid system ŷ t is given by a function f(.) that combines the forecast of the time series P 0 with the forecast of the residuals P 1 to estimate y t . Scientific Reports | (2022) 12:487 | https://doi.org/10.1038/s41598-021-04238-z www.nature.com/scientificreports/ P 0 is the forecast of the time series given by the M 0 model (Eq. 2), and P 1 is the forecast of the residual series (e t−1 , . . . , e t−n ) given by the M 1 model (Eq. 3). The residual series is calculated as the difference between the predicted and the actual values.
where and where m and n are the time lags used as input to the M 0 and M 1 models, respectively. The time lags can be defined using the auto-correlation function (ACF), partial auto-correlation function (PACF), or some searching algorithm 31,33 . Based on the general architecture described by the f function in Eq. (1), two classes of hybrid systems have been studied for real-world time series modeling: a combination of linear statistical methods with nonlinear Machine Learning (ML) models and a combination of ML models. For simplicity, the first class is denominated as a hybrid system and the second as a combination of ML models. The hybrid system class is described in "Hybrid systems-combining linear and nonlinear models" section. Techniques that combine ML models are presented in "Combining nonlinear models" section. This section also describes two recent techniques: the perturbative approach 22 and NoLiC 23 .
Hybrid systems-combining linear and nonlinear models. Linear statistical models have been combined with nonlinear ML models based on the assumption that real-world time series generally present linear and nonlinear patterns 16,17 . Thus, in this hybrid system class, statistical models are used as M 0 , and ML models are employed as M 1 intended to deal with linear and nonlinear patterns separately.
The f function that is responsible for combining P 0 with P 1 can be either linear or nonlinear 16,19,23,34 . The linear combination, which is more commonly used in the literature 16,29,35 , consists of a non-trainable rule, such as the sum. This combination has been successfully used in several applications, for instance: financial indexes 29 , wind speed 35 , groundwater level fluctuations 36 , the prevalence of schistosomiasis in humans 37 , particulate matter 38 , and water quality 39 .
Zhang 16 showed the linear combination: where P 0 is the forecasting of a linear statistical model ( M 0 ), P 1 is the forecasting of an ML model ( M 1 ) applied to the residual of the time series, and ŷ t is the final prediction of the hybrid system performed by the linear combination. In his experiments, M 0 was defined as an ARIMA and M 1 as an MLP neural network. Despite being widely used in the literature, the linear combination of the forecasts P 0 and P 1 can underestimate, or degenerate the accuracy of the initial model ( M 0 ), since there may be no additive relationship between linear and nonlinear forecasts 17,[40][41][42] .
Based on this assumption, Khashei and Bijari 17,41,42 proposed a nonlinear combination of the forecasts to overcome the limitations of the linear combination. In their hybrid systems 17,41,42 , the function f (Eq. 1) is defined as an ML that receives as input P 0 , the residual ( (e t−1 , . . . , e t−n ) ), and the time series ( y t−1 , . . . , y t−m ), as shown in Eq. (5). They employed an ARIMA model for time series modeling as P 0 , and an MLP as P 1 .
In general, the nonlinear combination of the forecasts P 0 and P 1 reaches better results than the linear approach 17,41,43 . However, there is no guarantee that the nonlinear combination is the most appropriate for modeling any temporal phenomena 42 . Therefore, the best combination function of the forecasts of the time series (linear component) and the residual series (nonlinear component) is unknown, being still a research challenge in the hybrid systems research field 17,23,42 . Combining nonlinear models. Nonlinear ML models have been combined based on the assumption that adopting only one single model can be inadequate to real-world time series forecasting. The underperforming of a single ML model can occur due to problems caused by overfitting, underfitting, or misspecification 22,23,28 .
In Ginzburg and Horn 28 , two MLPs are combined linearly following the same idea shown in Eq. (4). Thus, the time series forecast ( P 0 ) is performed by the first MLP ( M 0 ), and its residuals are modeled by the second MLP ( M 1 ), generating the forecast of the residuals ( P 1 ). In this sense, the M 1 model is employed to uncover and model temporal correlations found in the residuals of M 0 , thus correcting the original forecast ( P 0 ). This premise is based on biological systems that commonly deal with complex tasks through subsystems 28 . Later stages of consecutive subsystems (networks) refine the response of earlier ones, improving the performance of the entire biological system 28,44 . This principle was also successfully employed in atmospheric pollution forecasting 31,45 .
The next two subsections show two recent approaches that combine nonlinear models: the perturbative approach 22 and NoLiC 23 .
(1) y t = f (P 0 , P 1 ), y t = f (P 0 , e t−1 , . . . , e t−n , y t−1 , . . . , y t−m ). www.nature.com/scientificreports/ The perturbative approach. The linear combination of ML models proposed in 28 was generalized in 22 , which employed the perturbation theory concept that was previously applied in many areas, such as physics, chemistry, and mathematics 46,47 . The idea is to initiate the forecasting of a time series using a first estimation (forecast P 0 ). Then, p new forecasts ( P 1 + P 2 + · · · + P p ) are added to make a partial forecast ever closer to the real solution P. Mathematically, where P is the desired solution (perfect forecasting), P 0 is the series forecast, and the term of the major contribution to P, and P 1 , P 2 , . . . , P p are the p higher-order terms (residual forecasts). Then, P 1 is the forecast of the residuals of M 0 , P 2 is the forecast of the residuals of M 0 + M 1 , P 3 is the forecast of the residuals of M 0 + M 1 + M 2 , and P p is the forecast of the residuals of M 0 + M 1 + M 2 + · · · + M p−1 . Theoretically, the corrections generated by the residual forecasts ( P 1 , P 2 , . . . , P p ) decrease since, at each perturbation i, the residual series E i present values closer to zero. In practice, the contribution of the residual forecasts ( P 1 , P 2 , . . . , P p ) depends on the specification and training of the model M i . Algorithms 1 and 2 show the training and the testing phases of the perturbative approach 22 , respectively.
The training phase is divided into two general steps: training the time series forecasting model (lines 5-9) and training the correction models based on the residual series (lines [10][11][12][13][14][15][16][17][18][19]. The algorithm's input is the training set of the time series, and the outputs are p residuals (error series E) and p + 1 trained models ( {M 0 , M 1 , . . . , M p } ). The training phase has two stop criteria: the maximum number of perturbations (pMax) or an increase in the error value in the validation set concerning antecedent perturbation (lines 14 to 15).
In line 4, the initial model ( M 0 ) is trained using the training set Y, generating the time series forecast P 0 in line 5. P 0 is the main contributor to the final solution P 22 . In line 8, the first error series ( E 1 ) is generated. The error series consists of the difference between the actual series and the estimated values provided by the perturbative approach (P).
After, the perturbative terms are generated. Each M i model is trained to forecast E i (line 10), which is the difference between Y and P (line 14). At each iteration of the loop (lines 9-15), a new perturbative term is generated (lines 10-11) and added to the final solution (line 12). At the end of the training phase, P is the sum of the p + 1 forecasts ( P 0 , P 1 , P 2 , . . . , P p ) of the p + 1 models ( M 0 + M 1 + M 2 · · · + M p ). Lines 10 to 14 are executed until the stopping criterion is reached. P = P 0 + P 1 + P 2 + · · · + P p , Scientific Reports | (2022) 12:487 | https://doi.org/10.1038/s41598-021-04238-z www.nature.com/scientificreports/ The testing phase (Algorithm 2) is divided into two steps: forecast of the time series (line 5) and forecast of the perturbative terms (lines 8-12). Lines 5 and 6 show the generating of P 0 and its inclusion in the final output P of the perturbative approach. Lines 8 to 12 show the second part, where the perturbative terms are generated for the test point (observation of the test sample). So, each model ( M 1 , M 2 , . . . , M p ) generate the forecasting of its respective error series ( E 1 , E 2 , . . . , E p ). This loop is repeated (p) times, which is the number of perturbations defined in the training phase. After, each perturbative term is forecasted (line 9) and added to the solution P using a linear combination (line 10), generating the final forecasting.
The NoLiC method. The NoLiC method 23 employs an adaptive combination of ML models with other techniques using the residual series. This combination method does not presuppose a linear combination as other works 16,22,28 . The idea is to find a combination function between P 0 and P 1 using an ML model that is flexible, capable of performing linear and nonlinear modeling.
The Nonlinear Combination (NoLiC) method is composed of three steps: forecast of the time series ( P 0 ), forecast of the residuals ( P 1 ), and the combination f(.) of P 0 and P 1 . Figure 1 shows the training and testing phases of the NoLiC method.
The training phase receives as input the training set and generates three trained models ( M 0 , M 1 , and M C ) as outputs. Similarly to other works 16,28 , the models M 0 and M 1 are employed to forecast the time series and the Single Model (M 0 ) , generating the forecast of the error series P 1 . After, the combination model M C receives as inputs P 0 and P 1 and is trained with the objective to correct the output of M 0 , generating a forecast (P) closer to the target (future value of Y t ).
In the test phase, the M 0 and M 1 models receive the lag values of the time series ( Y q ) and the residuals ( E q ), respectively. After, M 0 and M 1 models generate theirs respective forecasts P 0 and P 1 . Then, the trained ML model M C combines the forecasts of the series and residuals to produce the final forecast P.
Remarks. The combination methods described in "The perturbative approach" and "The NoLiC method" sections have different characteristics. The perturbative method can extract information from more than one residual series. However, this method supposes that the models should be combined using a simple sum rule. In contrast, the NoLiC method supposes a nonlinear combination between the forecasts and the residuals.
The NoLiC method employs an ML model aiming to find a combination more suitable than a simple sum. However, there is no guarantee that the M C model leads to the best accuracy of the hybrid system. The optimum performance depends on adjusting the parameters and training of the M C model, which is a complex task since it is related to forecasts of M 0 and M 1 . Thus, investigating how to combine the forecasts of the time series and its error series is a crucial issue in the definition of the hybrid system since it is closely related to its accuracy.

PIRATA data set
The Pilot Research Moored Array in the Tropical Atlantic (PIRATA) was developed by a network of observatories composed of many countries, such as Brazil, France, and the United States. This project has the objective to improve the knowledge about atmospheric variations in the tropical Atlantic Ocean 48 . The climatic variations in this area can influence the development of droughts, floods, severe storms, and even hurricanes, affecting millions of people in South America and Africa 3 (Fig. 2).
The project PIRATA has buoys in the ocean, where meteorological variables are collected, such as shortwave radiation, relative humidity, air temperature, and ocean surface temperature. All data are gathered and transmitted by satellite and are available on the project web page (https:// www. pmel. noaa. gov/ gtmba/ pirata).
We aim to perform the forecast of the sea surface temperature of three regions 3 : 8 • N 38 • W (Fig. 3a), 10 • S 10 • W (Fig. 3b), and 4 • N 23 • W (Fig. 3c). These locations were selected because they have an appropriate amount of data for modeling with ML models. These data sets do not present interruptions and are located in different regions. The selected locations can be seen in Fig. 2, represented by red points. Table 1 shows the characteristics of each time series used in this work.

Experimental protocol
The experiments evaluate two machine learning models: support vector machines (SVR) and long short-term memory (LSTM). These models were chosen because they reached relevant results regarding accuracy for the SST forecasting task 3,49,50 . The SVR and LSTM models are employed as single models as well as in the combination approaches.
The SVR model was successfully employed in the SST forecasting 3 and has highlighted results in several other forecasting applications 51 . SVR is an interesting choice because it employs a quadratic optimization procedure to solve a convex constrained problem, with a single solution 52 . Therefore, in contrast to methods such as neural networks where several local minima can be achieved, the uniqueness of the solution of SVR is obtained given a set of hyperparameters. To the kernel SVR, the Radial Basis Function (RBF) was selected because it is a wellestablished kernel function in the time series forecasting area 51 . RBF kernel also was successfully employed in the SST forecasting 3 and has been widely used in hybrid systems [18][19][20]29,53 . Besides, the RBF is considered the default SVR kernel in the Sklearn 54 library, which is now the most popular package for creating SVR models in Python. The RBF's popularity can be explained by its finite and localized responses across the entire range of the x-axis, so it does not need previous assumptions about the data and adds few parameters to the SVR model (Cost and Gamma) 55 . www.nature.com/scientificreports/ LSTM was selected because it is one of the state-of-the-art ML models in time series forecasting. It has outperformed traditional neural networks in several applications 56 . Its ability to deal with short or long-term temporal dependencies can be promising in the SST time series modeling 57 .
For the combination approach, the same model (SVR or LSTM) are employed for all stages. For the perturbative approach, the best number of perturbations (or corrections) was selected based on the MSE value in the validation set having a upper limit of four perturbations. For all models, a grid search approach was performed for   Table 2 shows the set of parameters investigated for each model in the 1-day-ahead forecasting scenario. The number of input lags used in the grid search was selected based on PACF. For S1, the lags 1, 3, 4, 11, and 15 presented significant linear correlations. For S2, PACF selected the lags 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 28, and 30, and for S3, the lags 1, 2, 3, 5, 17, and 18 presented relevant correlation. Table 3 shows the values selected for single and combination approaches for 1 day ahead SST forecasting for each study case (S1, S2, and S3 series). It is important to highlight that all combination approaches use the same M 0 , and the NoLiC method employs the same M 0 and M 1 of the perturbative approach. So, it is possible to compare the performance of the combination approaches directly. For all series, the perturbation approach employed three perturbations for the SVR model and two perturbations for the LSTM model.
The performance of the approaches are evaluated using three performance metrics applied to the context of sea surface temperature forecast 3,4 : mean square error (MSE), mean absolute percentage error (MAPE) and mean absolute error (MAE). Equations (6), (7) and (8) show the MSE, MAPE and MAE metrics, respectively.
where N represents the time series length, y t the true value at time t and ŷ t is the forecast at time t. For all metrics, the lower the values, the better the results. A percentage gain/loss measure (Eq. 9) is used to compare the combination approaches with the single models.
where Metric sm and Metric comb represent the MSE values reached by single models and combination approaches, respectively. In this way, the higher the PC, the better the performance of the combination approach in relation to the single model. The SVR and LSTM models were implemented in the Python programming language using the Sklearn 54 and Keras 59 libraries. The experimental simulations were performed in a computer with a single Intel Core i7-7500 CPU and 20 GB RAM.
The experimental comparison was carried out among the single models, the hybrid approaches, and the following literature models: Exponential Smoothing (ETS) 60 , Convolution LSTM (ConvLSTM) 11,50 , and the Nonlinear Autoregressive Exogenous (NARX) 1 .
Exponential Smoothing (ETS) is a traditional statistical method employed in time series forecasting 60,61 . It is a versatile method due to its ability to model time series with/without trend and seasonality components. However, the ETS can reach a limited performance in forecasting time series that present nonlinear patterns 62 . The experiments with ETS were carried out using the Statsmodel library of Python 63 .
The Convolutional Long Short-Term Memory (ConvLSTM) 11,50 is a Deep Learning technique able to model spatiotemporal correlations. The ConvLSTM models spatial and temporal patterns using convolution and LSTM layers. This technique attained higher accuracy than other ML models to SST time series forecasting tasks 11,50 . On the other hand, its training can be costly computationally due to the number of hyper-parameters that must be adjusted 64 . In this work, the employed ConvLSTM used the configuration suggested in the SST forecasting works 11,50 .
Nonlinear Autoregressive with Exogenous Input neural network (NARX) was proposed to model the nonlinear and autoregressive behaviors 65 . NARX model was successfully used to predict SST anomalies in the western Indian Ocean region 1 . Despite being able to forecast seasonal anomaly trends, the NARX performance is highly sensitive to parameters specification 1 . Table 4 shows the results regarding MSE, MAPE and MAE for the test set of the S1, S2, and S3 series, for 1-day ahead forecasting. In that table is possible to compare the performance of the hybrid systems with single and literature models.

Simulations and experimental results
For the S1 time series, all hybrid systems improved the accuracy of their respective single model, reaching better MSE, MAPE, and MAE values better than literature models. In particular, the NoLiC employing the LSTM model attained the best result in all considered metrics. Regarding MSE, the hybrid system versions obtained an error of one order of magnitude smaller than their respective single models, for instance, 3.97E−04 for NoLiC+LSTM and 6.89E−04 for LSTM.
The S2 and S3 series follow the same behavior: all hybrid systems versions improved the performance of their respective single models for the evaluated metrics. For the S2 time series, the perturbation approach using the SVR model attained the best performance in terms of MSE, MAPE, and MAE. This hybrid system version, which employed three perturbations ( P 0 + P 1 + P 2 + P 3 ), improved the MSE value in two orders of magnitude regarding the single SVR.
Hybrid systems that use the LSTM model deserve special attention for the S3 time series. In this case, the NoLiC attained the best MSE value, while the perturbative approach ( P 0 + P 1 + P 2 ) obtained the smallest MAPE Table 3. Selected parameters for SVR and LSTM in the combination approaches using a grid search in the validation set for 1 day ahead SST forecasting.  Tables 5 and 6 show the percentage difference (Eq. 9) in terms of the MSE metric between the literature models and the perturbative and NoLiC approaches, respectively. The tables show that the hybrid systems improved the performance of both single models for the S1 series. The NoLiC using the LSTM model attained an improvement greater than 65% for all evaluation metrics ( Table 5). The versions of the hybrid systems attained a superior performance, at least 20% when compared with the literature. Figure 4a,b show the forecasts of the S1 series test set of the hybrid approaches using SVR and LSTM, respectively. It can be seen that both hybrid systems were able to improve the forecasting of the single models. Both hybrid approaches achieved forecasts closer to the real when compared with the initial model.
For S2, Table 5 shows that the perturbative approach reached an improvement higher than 30% in all comparisons. This approach with SVR obtained a percentage gain regarding SVR of 98.93%, 90.56%, and 91.09% for MSE, MAPE, and MAE, respectively. Table 6 shows the NoLiC using LSTM model attained a gain concerning to single LSTM of 98.42%, 88.40% and 88.87% for MSE, MAPE, and MAE, respectively. Figure 4c,d show the forecasts of the S2 series test set of the hybrid approaches using SVR and LSTM, respectively. Both figures show that the hybrid systems improved the forecast of the single models. In both comparisons, it is possible to verify that the forecast of the hybrid systems using SVR or LSTM is closer to the test set of S2 than the respective single model. Tables 5 and 6 show that the percentage difference between hybrid systems with LSTM and single models is positive in all comparisons for the S3 data set. The NoLiC using SVR obtained the greatest improvement regarding single SVR with 65.03%, 39.50%, and 37.68% for MSE, MAPE, and MAE, respectively. Figure 4e,f show the forecasts of the S2 series test set of the hybrid approaches using SVR and LSTM, respectively. The forecasts obtained by the perturbative and NoLiC approaches are closer to S3 series when compared with the single models. Supplementary Information presents additional analyzes. Discussion. To verify if there are (or not) significant statistical differences between the hybrid systems and literature approaches, we employed the Diebold-Mariano statistical test 66 . We use MSE since it is the target metric employed to guide the search of the parameters of the models. Table 7 shows that both versions of the perturbative approach attain MSE values statistically different from the single and literature models, i.e., the p Table 4. Comparison in terms of MSE, MAPE, and MAE of the combination approaches with single statistical and Machine Learning models of the literature applied to the SST daily forecasting. For each data set, the best value of the metrics is highlighted in bold. www.nature.com/scientificreports/ value is smaller than the significance level adopted (0.05) in all comparisons. The NoLiC employing LSTM also reached results statistically better than other models. Only the NoLiC version using SVR attained an MSE worse than NARX 1 and ConvLSTM 11,50 models. Table 8 shows the execution time (in seconds) of the testing phase calculated over 30 executions. The evaluated approaches presented an execution time smaller than 1 s in all data sets. It is important to highlight that the hybrid systems based on the LSTM model are more costly regarding computational effort than the ones based on SVR. For instance, the SVR's perturbative approaches were less computationally costly than single LSTM for S1 and S2 series.
The complexity analysis of the hybrid system can be divided into p steps, each one corresponding to the training of a model ( M 0 , M 1 , . . . , M p ). The evaluated hybrid systems are trained sequentially, and their training time can be described as MT 0 + MT 1 + · · · + MT p , where MT is the training time of the model in a specific phase. In this way, the NoLiC approach is approximately three times more expensive than the single models, because the NoLiC uses three models ( M 0 , M 1 , and M C ), and the perturbative approach is approximately p times more expensive than the single models because it uses M 0 , M 1 , . . . , M p models. This work applies two compositions of hybrid systems, using SVR or LSTM. The SVR training process has a complexity of O(lm) 18,67 , where l is the size

Conclusion
The Sea Surface Temperature (SST) is an important environmental variable due to its strong relationship to climate, weather, and nature events, such as El Niño. So, the SST accurate forecast can support decisions in several science fields.
(a) Forecasts of the hybrid systems using SVR and the respective single model for the S1 series.
(b) Forecasts of the hybrid systems using LSTM and the respective single model for the S1 series.
(c) Forecasts of the hybrid systems using SVR and the respective single model for the S2 series.
(d) Forecasts of the hybrid systems using LSTM and the respective single model for the S2 series.
(e) Forecasts of the hybrid systems using SVR and the respective single model for the S3 series.   In this work, we evaluated two types of hybrid systems intending to improve the performance of single ML models in the task of SST forecast. The hybrid systems are evaluated in the 1-day-ahead forecasting scenario. The purpose was to correct biased and deteriorated forecasts of the ML models by modeling the error series. The Perturbative and NoLiC hybrid approaches employ linear and nonlinear combinations, respectively. For each approach, two versions were generated, one using SVR and another using LSTM as base models. All the models were evaluated in three data sets of different locations in the tropical Atlantic using traditional metrics (MSE, MAPE, and MAE) of the literature. Compared with the ML single models, the hybrid system approaches obtained a significant performance improvement (more than 20%).
Regarding the hybrid systems, it was possible to verify the influence of the combination function in their performance. The linear combination used by the Perturbative approach obtained the best performance in two out of three study cases regarding MSE. Although NoLiC employs a combination function more versatile than the simple sum, it could not overcome the linear combination in most cases. In particular, when the perturbative  Table 8. Testing time in seconds of the single models and combination approaches for 1 day ahead forecasting. For each approach is presented the mean testing time and the respective standard deviation. www.nature.com/scientificreports/ approach uses the LSTM as the base model, it reached the highest performance in three out of five cases. The LSTM's best performance compared to the other ML models can be attributed to its ability to capture long-term temporal dependencies due to its recurrent abilities, such as memory cells1. In this way, the LSTM can consider previous training examples on its forecasting process, creating a better understanding of the past data and a more robust combination process. It is crucial to remark that both single and hybrid models struggled to forecast extreme points. This issue is a challenging task in the time series literature 12 . Another is the absence of sufficient extreme cases in the training set, which can bias the training process towards more regular cases and the applied target metric (MSE). The hybrid system's computational effort is the sum of the costs of modeling its counterparts and depends on each model's parameters set. The computational cost can be minimized in the test phase, parallelizing the time series and residual forecasting.

Datasets
For future works, we intend to improve the accuracy of the hybrid systems to better forecast extreme values by automatically searching for the most suitable combination function. Besides, different base models, such as convolutional neural networks, echo state networks, and decision trees for regression, can be investigated.