Proposing a hybrid metaheuristic optimization algorithm and machine learning model for energy use forecast in non-residential buildings

The building sector is the largest energy consumer accounting for 40% of global energy usage. An energy forecast model supports decision-makers to manage electric utility management. Identifying optimal values of hyperparameters of prediction models is challenging. Therefore, this study develops a novel time-series Wolf-Inspired Optimized Support Vector Regression (WIO-SVR) model to predict 48-step-ahead energy consumption in buildings. The proposed model integrates the support vector regression (SVR) and the grey wolf optimizer (GWO) in which the SVR model serves as a prediction engine while the GWO is used to optimize the hyperparameters of the SVR model. The 30-min energy data from various buildings in Vietnam were adopted to validate model performance. Buildings include one commercial building, one hospital building, three authority buildings, three university buildings, and four office buildings. The dataset is divided into the learning data and the test data. The performance of the WIO-SVR was superior to baseline models including the SVR, random forests (RF), M5P, and decision tree learner (REPTree). The WIO-SVR model obtained the highest value of correlation coefficient (R) with 0.90. The average root-mean-square error (RMSE) of the WIO-SVR was 2.02 kWh which was more accurate than those of the SVR model with 10.95 kWh, the RF model with 16.27 kWh, the M5P model with 17.73 kWh, and the REPTree model with 26.44 kWh. The proposed model improved 442.0–1207.9% of the predictive accuracy in RMSE. The reliable WIO-SVR model provides building managers with useful references in efficient energy management.


Previous works
The challenges of energy consumption forecasting are data's nature (i.e., non-linear, non-stationary, and multiseasonality) and dependence on influential factors like weather conditions, time, occupancy, etc. 17 . Physics-based and conventional models, thus, are inefficient and ineffective in modeling large building profiles. Data-driven approaches like ML models have been widely used to solve high-dimensional energy data 5,18 . ML and AI have been applied in various fields 19 such as fish species identification 20 , rainfall prediction 21 , and energy reduction and greenhouse gas emissions mitigation 22 .
An accurate energy demand forecast model supports the decision-makers to carry out short-term management of the electric utility. In the study of Zhu et al. 5 , popular ML techniques such as LASSO regression, RF, CART, SVR were utilized to predict daily electrical load in non-residential buildings. Residuals of prediction models were then analyzed by a statistical quality control theory to monitor and identify abnormal load patterns. Biswas et al. 23 compared various neural network-based models in predicting residential building energy usage. The inputs include the number of days, outdoor temperature, and solar radiation while the outputs consist of house and heat pump energy consumption.
Ensemble models, such as RF, DT, gradient boosting have emerged as a promising approach for modeling building energy data. Lu et al. developed a tree-based ML model-extreme gradient boosting (so-called XGBoost) to predict daily electrical load from the City of Bloomington Intake Tower in Indiana, USA 24 . This tower plays an important role in residents' lives since its function in collecting water from reservoirs and transporting it to hydroelectric power plants or water treatment plants. The complete ensemble empirical mode decomposition with the adaptive noise (CEEMDAN) method was applied for decomposing energy patterns. Analytical results indicated that the CEEMDAN-XGBoost model outperformed benchmark models in all measures. Ma 25 proposed a hybrid deep meta-ensemble networks to forecast hourly electric load data of 20 zones at a US utility. Experimental results presented that the forecast ability of the proposed model is superior to that of multi-layer perceptron, RF, and gradient boosting regression trees. www.nature.com/scientificreports/ Pham et al. 13 investigated the effectiveness of RF, M5P, and Random Tree (RT) in predicting short-term energy consumption in multiple buildings. Five datasets in hourly intervals collected in one year were utilized to validate the predictive ability of ML prediction models. Experimental results showed that the RF model was superior to the M5P and RT models in 1-step-ahead energy usage prediction. Pan et al. 12 proposed a new novel ensemble model namely categorical boosting (CatBoost) to estimate multi-dimensional energy consumption data. Findings revealed a better forecast performance of the CatBoost than the RF and gradient boosting decision tree.
The predictive accuracy of the machine learner depends on tuning its parameters. The selection of such parameters is a real-world optimization problem. Metaheuristic algorithms have been widely adopted to optimize the parameters of the ML models. For example, Somu et al. 17 introduced a combined model for short, mid, and long-term energy consumption forecasting at an academic building. The long short-term memory networks were optimized by the improved sine cosine optimization algorithm. Comparative results indicated the outperformance of the proposed forecast model to single models such as the autoregressive integrated moving average (ARIMA), Deep Belief Network regression, SVR.
Ma et al. 26 established a modified convolutional neural network (CNN) for the interval forecasting of electricity demand. In the proposed model, the whale optimization algorithm (WOA) was employed to optimize parameters of the CNN and comprehensively evaluated calibration and sharpness of the probabilistic forecasting performance. For daily electric load forecasting of an office building, Ying et al. 27 proposed a backpropagation (BP) neural network optimized by the GWO. Moreover, the fuzzy C-means (FCM) clustering algorithm was utilized to cluster historical energy profiles. Empirical results indicated that the FCM-GWO-BP model significantly improved the prediction accuracy of the pure BP model and the GWO-BP model without clustering.
Various studies have adopted a composite prediction model of a neural network and a metaheuristic algorithm. However, the neural networks have some limitations such as local minimum traps, slow convergence speed, and difficulty in determining the size of the hidden layer and learning rate 28 . Thus, the least square SVR is a promising alternative. Besides that, the literature showed that few studies focused on the combination of the GWO and SVR for non-residential building energy usage forecasting. In this study, the authors proposed a time-series WIO-SVR model to predict energy consumption in non-residential buildings. This proposed model takes advantage of a metaheuristic optimization algorithm (i.e., the GWO) and a machine learning model (i.e., SVR). The GWO is used to optimize the hyperparameters of the SVR model during the learning process. Thus, the integration of GWO and SVR in the proposed hybrid model can enhance the predictive performance in predicting energy use consumption in non-residential buildings. The contributions of this study are: (1) a timeseries wolf-inspired optimized support vector regression (WIO-SVR) model for energy consumption prediction in non-residential buildings; (2) investigate the capacity of the WIO-SVR in electricity consumption forecasting in non-residential buildings.

Methods
Support vector regression for time-series data analytics. The support vector regression (SVR) 8 is a supervised learning model belong to machine learning, that is used for regression problems. It has been used for capturing the non-linear relationship between the predictors and dependent variables. Figure 1 demonstrates a framework of the SVR model. Inputs for training the SVR model in this study are historical energy data, temporal data, and weather data. Future energy consumption in buildings is a prediction output of the SVR model.
The SVR model uses a kernel function to maps predictors to high-dimension feature space. A least-squares cost function is applied to train an SVR model to yield linear equations in a dual space that reduces computing time. Particularly, SVR models are taught by solving Eq. (1).
where J(ω,b,e) is an objective function; ω is a linear approximator's parameter; e k is errors; C ≥ 0 is a regularization parameter; x k is predictors (i.e., historical energy data, temporal data, and weather data); y k is dependent variables (i.e., energy consumption in buildings); b is bias; and n is the dataset size. www.nature.com/scientificreports/ Lagrange multipliers (α k ) is utilized for dealing with this problem that results in Eq. (2). A kernel function is described in Eq. (3) in which the Gaussian radial basis functions (RBF) kernel as Eq. (4) was used. The RBF kernel was adopted as the kernel function because it has low mathematical complexity and effectively solves a highly nonlinear problem 29 .
where α k are Lagrange multipliers; K(x,x k ) is the kernel function; σ is the RBF width.
Performance of SVR models is affected by the value settings of its hyperparameters that consists of the RBF width (σ) and the regularization parameter (C). In this study, the optimal settings of these two hyperparameters were considered comprehensively. Particularly, a recently developed metaheuristic grey wolf optimization (GWO) algorithm 15 was integrated to optimize the performance of the proposed WIO-SVR model. The mathematical theory of GWO was presented in the next section.
Wolf-inspired optimization for enhancing machine learning model. The GWO is a metaheuristic optimization algorithm 15 that inspires the natural behaviors of grey wolves. The GWO follows the power hierarchy and hunting activities of wolves. Particularly, a swarm of grey wolfs is split hierarchically into four subswarms of alphas (α), beta (β), delta (δ), and omega (ω) in which power and responsibility of each group are different. Figure 2 shows the dominant structure in a grey wolf swarm.
• Alpha is a leader.
• Beta is subordinate wolves that consult the alpha and manage the pack.
• Delta is wolves that need to report to the alpha and betas.
• Omega plays as a scapegoat and reports to the alpha, beta, and delta.
For modeling the social power of wolves, the α is considered as the fittest solution; β and δ are considered as the 2-nd and 3-rd third best solutions, respectively 15 . The optimization by the GWO includes searching, encircling, and attacking prey. Figure 3 illustrates of positions of grey wolves and prey. The hunting process is led by α, β, and δ. Encircling prey was performed by updating the wolf position by using by Eqs. (5) and (6). Three of them were used to predict the location of the grey while the location of omegas was updated randomly surrounding the three best wolves as shown in Eqs. (7)-(10) 15 . www.nature.com/scientificreports/ where � X(t + 1) is a location vector of wolves at iteration (t + 1); X p (t) is a location vector of prey at iteration t; A and C are coefficient vectors; a are reduced from 2 to 0 through iterations; r 1 and r 2 are random vectors of [0, 1]. Figure 4 shows a pseudo-code of the GWO. The diversity of exploitation and exploration in the GWO was controlled by the vector A . During the optimization process, when |A|< 1, the wolves tend to approach the prey because the next location of wolves is in the area between their current locations and the prey's position 15 . This conforms the exploitation of the GWO. In contrast, as |A|> 1, wolves tend to diverge from the prey that confirms the global exploration. Besides, the vector C influences on the exploration and exploitation of the GWO because it is a random weight that affects the update of wolf location as shown in Eq. (5). This helps the GWO to overcome local optima.  Energy data, weather data, and temporal data were collected. Particularly, energy consumption data in hundreds of buildings were collected using the data metering and recording network by the power company. Temporal data consists of day of the week and hour of the day that were extracted by the datetime in database. These data were pre-processed before using for developing the prediction model. To improve performance of the model, the GWO optimized the model configuration by adjusting the SVR hyperparameters. The search space of the WIO-SVR model is set from 10 -3 to 10 3 . The RBF was used as the kernel function in the SVR model. The population is 100 and the maximum iteration is 10. The optimization engine generated initially the population of the GWO in which their locations represent the C and σ values of the SVR in the search space. The optimal model is obtained as stopping criteria are reached. The proposed model was developed in the MATLAB environment which is a programming and numeric computing platform. The running environment is in the 64-bit operation system, the processor of Intel (R), core i7-8559U CPU @ 1.8 GHz 1.99 GHz, 8.00 GB RAM.
Criteria of evaluating the forecast performance were root-mean-square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and correlation coefficient (R). The RMSE measures the differences between predicted values and measured values. The MAE presents absolute errors between the predicted values and measured values. The MAPE shows accuracy in a percentage manner. The lower values of RMSE, MAE and MAPE indicate the better forecast accuracy. The R is used to measure the relationship between actual data and predicted data. The higher the absolute value of the R, the stronger the relationship. These measures are often used to assess the predictive performance of machine learning models 30 . Their corresponding equations are as follows. www.nature.com/scientificreports/ where y is the actual energy consumption, y′ is predicted energy consumption, and n is size of the data sample.

Results
Dataset. Datasets in the 30-min interval in Danang city were collected to validate the proposed WIO-SVR model. Danang city was chosen as a case study in this study because Danang city is one of the biggest cities in Vietnam and Danang's government has a policy to develop a smart city in near future. 12 buildings are used as case studies which include one commercial building, one hospital building, three authority buildings, three university buildings, and four office buildings ( Table 1). The electrical usage patterns of 12 buildings are presented in Fig. 6. Figure 7 graphically depicts electrical consumption profiles using Box and Whisker method. Scatter plots of energy data and temperature data in Fig. 8 depict a positive correlation of energy and temperature data in all buildings. The WIO-SVR model aimed to predict 48-step-ahead energy consumption data which is a oneday-ahead prediction.
The WIO-SVR model was evaluated multiple times using multiple datasets. Particularly, 12 datasets from 12 buildings were used and 14 evaluations were performed for each dataset. The original dataset is divided into learning data and test data. Learning data include 6856 observations in 4 months ranging from August 1st, 2019 to November 30th, 2019. Training data which accounted for 70% of learning data is used to train the SVR model. The rest of the learning data (i.e., validation data) are used to optimize the trained model. Test data that are utilized to test the prediction ability of the optimized WIO-SVR model include 672 observations in 14 days ranging from December 1st to 14th, 2019. Summation of model evaluation was as below: • Prediction model: Wolf-inspired optimization support vector regression (WIO-SVR) • Forecast period or model output: 1-day-ahead prediction or 48-step-ahead prediction (30- Results and discussion. Learning data are divided into training data and validation data. Training data are used to train the WIO-SVR model while validation data are used to optimize the trained model. The forecast accuracy of the optimized WIO-SVR is evaluated by test data which ranges from December 1st to December 14th, 2019. Table 2 presents performance measures of the proposed WIO-SVR model for case study 1 in terms of RMSE, MAE, MAPE, and R. The high values of R in both learning phase and test phase indicated a good agreement between observed and predicted data. The highest value of R in the learning phase and the test phase was 0.97 and 0.99, respectively. For the learning phase, the WIO-SVR obtained the best performance measures on December 14th with the RMSE value of 3.36 kWh. In the test phase, the proposed model presented a good prediction ability.  Figure 9 displays actual and predicted values of electrical consumption in the test phase. Table 3 shows the evaluation results of the WIO-SVR model for case study 2. This case study utilizes 30-min resolution data collected from the office building. The predictive accuracy of the proposed model in the test phase was much better than that in the learning phase. Three measures of RMSE, MAE, and MAPE in the test phase had smaller average values than those had in the learning phase. The proposed WIO-SVR model obtained an average of MAPE with 64.32% in the learning phase comparing to only 6.89% in the test phase. Figure 10 compares actual observations and predicted observations of the WIO-SVR using test data. Figure 10 reveals that predicted values well captured with actual values. Generally, the WIO-SVR model exhibited high forecast accuracy and the good agreement between actual data and predicted values. Table 4 reported the predictive accuracy and standard deviation of the proposed WIO-SVR model in all cases. The smaller values of RMSE, MAE, and MAPE indicate the better predictive accuracy of the proposed model. The WIO-SVR yielded the smallest values of RMSE and MAE for the test data of the authority building in case 3. The values of RMSE and MAE were 0.31 kWh and 0.1 kWh, respectively. The smallest value of MAPE (6.96%) was obtained by the proposed model when using data collected from the commercial building (case 1). Among 12 cases, the WIO-SVR showed a lower predictive accuracy for case 11 with 3.63 kWh and 13.20 kWh in terms of RMSE and MAE, respectively. Particularly, the values of R in all cases are high which range from 0.78 to 0.98. This indicated that the proposed model is in good agreement with experimental datasets (Fig. 11).
To confirm the effectiveness of the proposed WIO-SVR model, its predictive accuracy was compared with those of machine learning models including the SVR, random forest (RF), M5P, and REPTree. These models are popular machine learning models and they were often used in building energy consumption prediction. Table 5 indicates that the predictive ability of the WIO-SVR was superior to that of baseline models in all cases. The R values of the proposed model in all cases were high indicating a good agreement between real data and predicted data. For the dataset collected from the commercial building (case 1), all performance measures of the WIO-SVR outperformed those of SVR, RF, M5P, and REPTree. For example, the WIO-SVR yielded the smallest values of RMSE with 2.49 kWh, followed by 9.52 kWh and 21.02 kWh of SVR and RF, respectively.  www.nature.com/scientificreports/ With regard to the dataset collected from office buildings (cases 2, 4, 5, and 6), the proposed model showed its outperformance in predicting electrical consumption. Notably, the RMSE and the MAE of the WIO-SVR model in case 5 were only 2.18 kWh and 4.85 kWh, respectively. These measures were significantly lower than those of other models. In case 6, although the WIO-SVR had a higher value of MAE than other models, the RMSE, MAE, and MAPE of this model were much lower than baseline models.
Similarly, the evaluation results of the WIO-SVR model for datasets collected from educational buildings (cases 8, 9, and 12) were significantly better than those of other ML models. Table 5 showed that the RMSE obtained by the proposed model in case 8 and case 9 was 3.14 kWh and 3.52 kWh, respectively. These measures were lowest among RMSE's values obtained by SVR, RF, M5P, and REPTree. Notably, all models in case 8 exhibited higher MAPE values than in the remaining cases. With the hospital building, the proposed model confirmed its capacity in forecasting electricity usage.
For authority buildings (cases 3, 7, and 10), the proposed model still maintained lower prediction errors than comparative models. In particular, the MAPE obtained by the five models was higher than that obtained by them in cases where data collected from commercial, hospital, and office buildings. Figure 12 compares measures of RMSE, MAE, and MAPE obtained by the proposed model and other ML models. The actual values and predicted values obtained by all models for case 1 and case 2 were presented in Fig. 13. The predicted values yielded by the WIO-SVR model were well captured with the actual ones.
The outstanding efficiency of the proposed model was confirmed in Table 6. Among the models, the proposed WIO-SVR model obtained the highest value of R with 0.90. The average value of the RMSE of the WIO-SVR was 2.02 kWh which was much lower than that of the SVR with 10   the predictive accuracy in terms of the RMSE and the MAE, respectively. Overall, the error rates of the WIO-SVR model are 43.7-1027.9% better than those of baseline models. Thus, the proposed WIO-SVR model was suggested as an effective model for forecasting energy consumption in non-residential buildings. This study proposed the hybrid machine learning model that integrates the GWO and SVR model to enhance the predictive performance of energy use prediction in non-residential buildings. Evaluation results revealed that the performance of the proposed WIO-SVR model was better than those of investigated machine learning model. This is because the hyperparameters of the SVR model were optimized by an optimization algorithm (i.e., GWO) during the learning process. Therefore, the WIO-SVR can be used as a tool for time-series energy consumption prediction in non-residential buildings.

Conclusions
The building sector has considered the largest consumer of energy that accounts for nearly 40% of global energy usage and 33% of greenhouse gas emissions. Electricity demand forecasting is an essential task since it supports making important decisions in power system planning and operation. An accurate energy demand forecast model supports the decision-makers to carry out short-term management of the electric utility. Meanwhile, machine learning models have become a mainstream of modeling building energy. This study develops a hybrid model of wolf-inspired optimized support vector regression (WIO-SVR) for energy demand forecasting in non-residential buildings. www.nature.com/scientificreports/   www.nature.com/scientificreports/ Electrical consumption data in 30-min interval collected in Danang city, the largest city in the central land of Vietnam, were adopted to validate the performance of the WIO-SVR. Twelve buildings are used as case studies which include one commercial building, one hospital building, three authority buildings, three university buildings, and four office buildings. The WIO-SVR model aimed to predict 48-step-ahead energy consumption data which is a one-day-ahead prediction. The performance of the proposed model is compared with those of benchmark models.
The original dataset is divided into learning data and test data. Learning data include 6856 observations in 4 months ranging from August 1st, 2019 to November 30th, 2019. Training data which accounted for 70% of learning data is used to train the SVR model. The rest of the learning data (i.e., validation data) are used to optimize the trained model. Test data that are utilized to test the prediction ability of the optimized WIO-SVR model include 672 observations in 14 days ranging from December 1st to 14th, 2019. The WIO-SVR model was  To confirm the effectiveness of the proposed WIO-SVR model, its predictive accuracy was compared with those of machine learning models including the SVR, random forest (RF), M5P, and REPTree. The predictive ability of the WIO-SVR was superior to that of baseline models in all cases. The R values of the proposed model in all cases were high indicating a good agreement between real data and predicted data. For the dataset collected from the commercial building (case 1), all performance measures of the WIO-SVR outperformed those of SVR, RF, M5P, and REPTree. For example, the WIO-SVR yielded the smallest values of RMSE with 2.49 kWh, followed by 9.52 kWh and 21.02 kWh of SVR and RF, respectively.
Among the models, the proposed WIO-SVR model obtained the highest value of R with 0.90. The average value of the RMSE of the WIO-SVR was 2.02 kWh which was much lower than that of the SVR with 10.95 kWh, the RF with 16.27 kWh, the M5P with 17.73 kWh, and the REPTree with 26.44 kWh. The proposed model improved 442.0-1207.9% and 43.7-238.3% the predictive accuracy in terms of the RMSE and the MAE, respectively. Overall, the error rates of the WIO-SVR model are 43.7-1027.9% better than those of baseline models. As the study`s contribution, an effective and reliable WIO-SVR model provides building managers with useful references in proactively efficient energy management.
This study contributes to (i) the state of the knowledge by examining the performance of WIO-SVR model in predicting time-series energy data; and (ii) the state of practice by proposing an effective tool to help building owners and facility managers in understanding building energy performance for enhancing the energy efficiency in buildings.
As a limitation, the proposed method requires users with basic programming skills and machine learning knowledge. In this study, the GWO was used as an optimization engine in the proposed model. Future studies