Introduction

The analysis of infrastructure use data in relation to other components of the infrastructure can help better understand interdependencies and interrelationships between them, with the potential to enhance their sustainability and resilience. Indeed, no infrastructure system works in isolation. All infrastructure systems—including transport, water, wastewater, electricity, gas, and telecommunications—are interdependent1,2. In part because of these interdependencies, but also intrinsic to how people live, the way infrastructure systems are used is also interrelated. For example, Movahedi and Derrible3 showed that electricity, gas, and water consumption in large-scale buildings are interrelated (i.e., the consumption of one can be predicted by the two others). Zhang and Qian4 classified the patterns of electricity consumption over a night to estimate the traffic congestion of a highway in the morning. Overall, infrastructure systems are often more interrelated than initially expected, for example by sharing physical surface and subsurface space5 and by competing for time and resources6,7.

In this study, by using zip code-level electricity data as well as traffic loop detector data, we seek to identify and understand interrelationships between travel demand and electricity demand. More precisely, using traffic data to count the number of vehicles entering and exiting a zip code can capture the number of people in a zip code at a given time who may be in buildings otherwise, consuming electricity. Concurrently, a decrease in electricity consumption can express that people have left a building and may use a vehicle, generating traffic. In this study, we use electricity consumption data of several zip codes in Chicago at 30-min intervals for November 2017. To achieve our goal, we use Long Short Term Memory (LSTM) network—a type of deep learning model—to model electricity consumption patterns of zip codes based on the traffic volume of the same zip code and nearby zip codes. The specific objectives of the study are to:

  • Understand the correlation between electricity consumption and traffic volume.

  • Investigate the temporal relationships between electricity consumption and traffic volume.

  • Investigate the spatial relationships between electricity consumption and traffic volume.

  • Develop models to predict electricity consumption based on traffic volume.

In the next section, we review the literature on electricity consumption and traffic modeling, and on interrelationships between infrastructure systems in cities. After, we describe the electricity consumption and traffic datasets used in the study. Next, we go over the results by addressing each objective sequentially, and we then discuss these results. Finally, we explain in detail the methodological approach utilized in the study.

Literature review

The electricity power grid is a complex system with many components8. The stable and uninterrupted operation of the power grid plays a vital role in economic development, national security, and overall social welfare. As of this writing, electricity cannot be cheaply and effectively stored in required massive amounts. As a result, electric utilities and other power market players must forecast electricity consumption in the (a) short-term (few minutes to hours), (b) mid-term (hours to a day ahead), and (c) long-term (seasonal/annual, up to a few years) in generation, transmission, and distribution networks. Thanks to the deployment of smart meters, predictions have become generally more accurate. This accurate forecasting of electricity consumption levels is crucial for power systems, and the selected method for making predictions provides a better understanding of the dynamics of the system and can even help ease operating costs for market players. The traditional predictive techniques include the construction of mathematical and statistical models such as auto-regressive and moving average (ARMA) models9; auto-regressive integrated moving average (ARIMA) models10; multiple linear regression (MLR) and principal component analysis (PCA) models11; gray models (GM)12; and Kalman filter-based (KF) models13. Nonetheless, traditional statistical models are known to be limited. For instance, GM models are not always effective for electrical load forecasting but work better for addressing small sample problems14 and ARMA models may fail to consider the influence of random variables other than in typical time series forecasting methods10,14. This means that traditional statistical models work well in normal daily conditions, but they become less reliable while dealing with meteorological, sociological, and economic changes15 or with relations to other systems.

To deal with complex nonlinear relationships, machine learning (ML) and deep learning techniques are generally preferred. The following techniques are mentioned in the literature: artificial neural networks (ANN)16,17,18 fuzzy-logic-based algorithms19,20, genetic-algorithm-based (GA) neural network21, support vector machine (SVM)22, tree-based models23,24,25, LSTM-based neural network26; single hidden layer network configurations with random weights (RWSLFN)27, and multilayer perceptron (MLP)28 to name a few. In the literature, LSTM has been shown to perform particularly well on time series data for a range of applications, including to predict the spread of COVID-1929,30,31,32. Specifically looking at traffic forecasting and flow prediction, several studies33,34 also found that LSTM performed better than traditional techniques like ARIMA or other ML techniques like support vector regression (SVR). In this study, we have opted to solely use LSTM as our main goal is not to find the best performing model but to investigate the presence of interrelationships between electricity consumption and traffic volume.

As many studies demonstrate35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51, electricity consumption is linked to myriads of variables, from urban characteristics (e.g., morphology, density) and building characteristics (e.g., size and insulation technology) to weather characteristics (e.g., temperature and cloud coverage) and socio-economic and demographic characteristics (e.g., household income and age). Yet, this list is not exhaustive. As infrastructure systems are interdependent and interrelated by nature52, electricity consumption is also linked to demand patterns for other infrastructure services, such as residents commute time, traffic, and urban mobility patterns, suggesting that traffic network data can be used as a source of information to predict electricity consumption as well53. To date, little research has been carried out and not many studies are available that focus on the interrelation between travel and electricity demand. Few studies explore the causal interdependencies between electricity, transport, and weather data53,54. Gilanifar et al.55 developed a Bayesian Gaussian Process model that explores usage of electricity to enhance short-term load forecasting. Aparicio et al.56 studied the dependencies between power demand and road traffic data using linear correlation and compare the results with other standard features, such as historical load and temperature.

Data

Electricity

In this article, we work with anonymized energy usage data in 30-min intervals at the zip code level within the city of Chicago collected by the local utility Commonwealth Edison (ComEd) and accessible (for a fee)57. Each measurement in the dataset represents the total electricity consumed (in kWh) for a specific customer in a certain time interval (30 min). We decided to build our research on this dataset because we assume that the raw high-resolution interval data that we get directly from the automated metering infrastructure (AMI) have a high level of accuracy and fidelity.

Interval data from AMI has become widely available to utilities throughout the U.S.58. It is often used to identify energy use trends and peaks in the interest of anomaly detection59 and to make predictions of electricity consumption60 to improve the stability of the power grid. Household data include load shapes measured at the household level considers seasonal and daily fluctuations and show significant differences in electricity consumption during the day, week, month, and year. We are interested in observing the loads in one month with a specific focus on the time of the day and the day of the week. We used residential electricity consumption from 28 zip codes located along the main transport corridors of Chicago: I-290, I-90, I-55, I-57, and I-94 interstate expressways for the month of November 2017.

While beneficial for both utilities and customers, data collected and utilized using AMI systems have caused concerns regarding customers’ privacy61. Although Martínez et al.62 observes a potential privacy issue of simple anonymization methods, the distribution of fine-grained data is normally considered acceptable as long as they cannot be linked to the households they originate from through an anonymization process63.

In this study, we use data that consists of fine-grained records of electricity consumption aggregated by 5-digit zip codes where specific identifiers, including but not limited to name, address, and electric account number, are omitted. Table 1 shows average electricity consumption per building in each zip code (in kWh). The table also includes area (in square kilometers) and population (based in American Community Survey (ACS) 2019 5-Year Data) for each zip code for the interest of the reader.

Table 1 Average electricity consumption (kWh).

Traffic

Traffic volume is captured by loop detectors on the Chicago expressway network. These data are collected by the Gateway Traveler Information System and provided by the Illinois Department of Transportation (IDOT). For this study, 211 loop detectors across Chicago from the Kennedy (I-90/94), Eden (I-94), Eisenhower (I-290), Stevenson (I-55), Dan Ryan (I-90/94), Bishop Ford (I-94), and I-57 expressways are used. Each loop detector includes the number of cars that pass a point in the last 5 min. Standard data cleaning processes were applied to remove missing and erroneous data points that may originate from detector malfunction, pavement condition, or from any other reason. Finally, we aggregated traffic volumes to 30-min time periods to be able to merge the traffic dataset with the electricity consumption dataset. Table 2 shows the average traffic volume per lane per 5 min in each zip code across Chicago. Similar to Table 1, we added the area and population for each zip code. In this study, we only focus on 28 zip codes (out of 56 in Chicago) because the expressway system only cross 28 zip codes.

Table 2 Average traffic volume (vehicles per lane per 5 min).

Results

Correlation between electricity consumption and traffic volume

First, we can look at the correlation between two time-series datasets, traffic volume and electricity consumption. For that, we utilize the Pearson r correlation coefficient to measure the linear relation between electricity consumption and traffic volume. Specifically, we calculate the Pearson coefficients in three levels: loop detector, zip code, and citywide.

For each loop detector, we assign the zip code in which the loop detector is located. Then, we calculate the Pearson r value for each loop detector across the city. Figure 1a shows the histogram of the Pearson r values. They are distributed between 0.004 and 0.81. The highest frequency of Pearson r values (66 out of total 211 loop detectors) is in the range [0.6, 0.7). To interpret properly the Pearson results we need to consider where zip codes with similar Pearson r values are located. Figure 2 shows the Pearson r values of the loop detectors and zip codes on a Chicago map. We can see that most loop detectors and zip codes with similar Pearson r values are located near one another.

Figure 1
figure 1

Pearson r values.

Figure 2
figure 2

Pearson's r values—Chicago map. Environmental Systems Research Institute (Esri) ArcGIS Desktop 10.8.1 commercial versions were used to perform preliminary data preparation and convert tabular data to spatial data. URL: https://www.esri.com/en-us/arcgis/products/arcgis-desktop/resources.

At the zip code level, we consider all loop detectors in one zip code and calculate the Pearson r values for electricity consumption and traffic volume. Figure 1b shows that Pearson r values are distributed between 0.09 and 0.66 with eight values being in the range [0.6, 0.7). Figure 2 shows how different zip codes have different correlations between electricity consumption and traffic volume. Except for a few zip codes, the figure suggests that the correlation is higher in the north side and the center of the city, and it decreases as we move south. This difference likely stems from the fact that expressways are used as the boundary between zip codes in the south. On a map, while individual loop detectors belong to one zip code, the drivers getting off the expressway may be going to the adjoining zip code. The low accuracy values therefore do not necessarily suggest the absence of interrelationships, but the lack adequate data.

At the citywide level, we use all the traffic volume data and the corresponding electricity consumption of the zip codes to calculate the overall Pearson r value for Chicago that comes to 0.14. Next, we consider a delay in the datasets since a person leaving a building can take time before reaching an expressway and vice versa. Specifically, we increase the delay from 30 min to one day in 30-miniute increments (i.e., 30 min, 60 min, 90 min, …, 1 day) and calculate the correlation of the electricity consumption with the delayed traffic volume. The result of the overall Pearson coefficient correlation shows that the 60 min delay has the highest Pearson value with 0.16, which is low and does not suggest strong correlations at the citywide level.

Temporal relationships

The goal of this section is to investigate the temporal relationships between traffic volume and electricity consumption. For that, we train LSTM models using traffic volume to predict electricity consumption in a zip code. The first question that arises is the size of the time window that should be used. For example, if we want to predict electricity consumption of a zip code at 4:00 PM, is using traffic volume at 4:00 PM in nearby loop detectors sufficient? Or is it better to consider two time windows with traffic volumes at 3:30 PM and 4:00 PM together to predict the electricity consumption at 4:00 PM? Or is it better to consider more time windows, like 16 from 8:30 AM to 4:00 PM?

To answer this question, we test many time windows for each zip code and compare the performance of the model. Specifically, to predict electricity consumption at time t, first we use traffic volume at time t and train and assess the performance of the trained LSTM model. Then we use traffic volumes at times t and t −  30 min and perform the same analysis. The same procedure is repeated until 24 30-min periods are tested, representing a 12-h time window.

The results can be categorized into two groups. In group 1, increasing the time window steadily increases the model performance; Fig. 3 shows an example for zip code 60631. In group 2, increasing the time window initially increases the model performance, but only up to a point (around 16 time periods or 8 h); Fig. 3 shows an example for zip code 60616.

Figure 3
figure 3

Temporal interrelationships.

While every zip code has a specific optimal time window, a time window of 16 periods (8 h) tends to perform well across all zip codes since it shows both a high performance for groups 1 and 2 zip codes.

These results are interesting and suggest that the temporal interrelationships between electricity use and travel demand are complex. In particular, we expected the optimal time window to be around 2–3 h for every zip code, to take into account typical rush hour periods, but we find that accuracy keeps increasing until at least 8 h. This means that to predict electricity consumption at 5PM, the use of traffic data between 9AM and 5PM is preferred. We posit that a larger time window of 8 h better captures lifestyle elements, such as an 8-h workday, but this value could vary across by culture.

Spatial relationships

The goal of this section is to investigate the impact of the distance between zip codes and loop detectors on the relationships between electricity consumption and traffic volume. Therefore, in this section, first, we train LSTM models to predict electricity consumption based on the traffic data from the closest loop detectors, then we increase the distance between loop detectors and the zip code. Here, we use an 8-h time window in our LSTM models (as found preferable in the previous section). To choose the zip codes to study the spatial relationships, we consider four conditions to control the impact of traffic volume from one expressway on the electricity consumption of a zip code. First the zip code should be crossed by only one expressway. Second, there should be only the loop detectors from the same expressway and no other loop detectors from other expressways in a radius of 5 km to limit the amount of noise fed to the model. Third, the zip code should be far enough from the boundaries of Chicago so we can have loop detectors on both side of the zip codes. Fourth, the accuracy of the LSTM model should be significantly more than zero to suggest the existence of a relationship. We applied these four conditions on the Chicago map and few zip codes satisfied them. As an example, we select three zip codes to study the relationships between electricity and travel demand.

To investigate the spatial relationship, we select one set of loop detectors that cross the zip code; each set has one loop detector in one direction of the expressway (toward the zip code) and one in the other direction (away from the zip code). The initial set is the ones closest to the centroid of the zip code. Then, we increase the distance and consider two new loop detectors further away from the centroid of the zip code. The procedure is repeated several times to loop detectors further away on the same expressway. Each time, a model is trained and the performance is compared.

Figure 4 shows the accuracy and errors of the models in terms of R2, MAE, and RMSE. In Fig. 4a,b, the average distance between the set of loop detectors (one for each direction) and the centroid of the corresponding zone is shown on the x-axis.

Figure 4
figure 4

Spatial relationships between electricity consumption and travel demand.

The purple line in Fig. 4a shows the spatial relationship found in zip code 60624. Here, increasing the distance between loop detectors and zip code reduces the accuracy and increases the MAE and RMSE. As expected, increasing the distance reduces the relationships between electricity consumption and traffic volume in this case.

Second, the dark red line in Fig. 4a is for zip code 60618. There, we observe that by increasing the distance, the model accuracy first increases and then it decreases after a certain distance. This phenomenon was unexpected since it suggests that loop detectors located in other zip codes are better able to predict electricity consumption. To further analyze the spatial relationships, we can use all loop detectors in the same zip code to predict the electricity consumption, which we present in the next section.

Finally, the green line in Fig. 4a shows the third type of spatial relationship. Here, increasing distance has no straightforward impact on the model performance.

Overall, we find that complex and unobvious relationships can exist between electricity consumption and traffic volume. Nonetheless, we should consider that each zip code has its own attributes, and to capture these attributes we can include a zip-code level fixed effect as is common in econometrics. Fixed effect variables are used to capture unique features of a data point despite the presence of common attributes63. What we can do here is to express R2 values as a function of distance from the zip code centroid. But because electricity consumption is collected at the zip code level—a surface area in square kilometers—we should use the square of the distance in our model instead. Our model therefore becomes:

$${RSquared}_{ij}={a}_{0}+{a}_{1}\times {distance}_{ij}^{2}+{zc}_{j}+{\varepsilon }_{ij}$$
(1)

where RSquaredij is the accuracy of model i in zip code j, \({distance}_{ij}\) is the distance between loop detectors and the zip code centroid in the model i in zip code j, \({zc}_{j}\) is the zip code fixed effect to distinguish between zip codes, \({\varepsilon }_{ij}\) is the error term, and \({a}_{0}\) is the constant term.

The result of the regression is as follows \({a}_{0}=0.293\) and \({a}_{1}=-0.0052\) with a p-value of 0.04, and the zip code fixed effect values are 0.317 for zip code 60618 and − 0.085 for 60624 (note that since we have three zip codes, we have two fixed effect values for the zip codes). The R2 of the general fit is 0.78. Figure 4c shows the actual versus predicted values of R2 using Eq. (1) and the coefficient values that we calculated. We find a negative relationship with the value of 0.0052 between distance squared and R2 values. In other words, we find that increasing the squared distance by one square kilometer generally decreases the accuracy of the model by 0.0052. An ANOVA test is also performed to test the null hypothesis (i.e., whether all variables could be statistically zero). Table 3 shows the result of the ANOVA test. Because the value of the F statistic is 21.77, which is greater than F(3, 18) = 2.416, the null hypothesis can be rejected with a 99% confidence level.

Table 3 ANOVA test results.

Overall, despite a careful selection of zip codes, we can see that the spatial relationship between traffic volume and electricity consumption are also complex, but they exist. More work is needed to gain a better understanding of these relationships.

Prediction models across the city

Here, we train two sets of models. The first set of models uses all loop detectors in a zip code to predict the electricity consumption of the zip code. The second set of models uses single loop detectors to predict electricity consumption of the zip code in which they are located.

In the first set of models, we develop 28 LSTM models for the 28 zip codes that are crossed by at least one expressway in Chicago. The input of the models is 8 h of traffic volume collected by all loop detectors in a zip code (8 h is selected since it performed well across all zip codes in the temporal interrelationships section). The output is the average electricity consumption of the zip code at the end of the 8-h period.

Figure 5a shows maps of Chicago with the R2, MAE, and RMSE values of the 28 LSTM models. First, we can see the overall performance of the models are better in the north side of Chicago than the south side. As mentioned above, one problem we face with the south side of Chicago is that the expressway serves as a boundary between zip codes. It is therefore more difficult to determine whether drivers exiting the expressway stay in the zip code where the loop detector is located or whether they go to the adjoining zip code.

Figure 5
figure 5

Performance of zip code and loop detector level models that use traffic volume to predict electricity consumption. Environmental Systems Research Institute (Esri) ArcGIS Desktop 10.8.1 commercial versions were used to perform preliminary data preparation and convert tabular data to spatial data. URL: https://www.esri.com/en-us/arcgis/products/arcgis-desktop/resources.

In the second set of models, we train 211 LSTM models for the 211 loop detectors in Chicago to predict the electricity consumption of the zip code to which each loop detector belongs. For each model, R2, MAE, and RMSE are calculated and shown in Fig. 5b; larger circles represent higher accuracies. We can see that accuracies are higher in the north side, similar to the previous models, likely again because the expressways serve as a boundary between zip codes in the south.

Overall, these results suggest that electricity demand and travel demand are interrelated, as in, one is related to the other and vice versa, but these interrelationships can be complex.

Interestingly, we note that the model performances are similar whether all or single loop detector are selected. This result suggest that single loop detectors may be sufficient to capture relationships between travel demand and electricity consumption. Another future area of research could focus on how much data is needed to capture interrelationships between infrastructure systems.

Discussion

The results show that the correlation between electricity consumption and traffic volume is complex since it varies by zip code across Chicago with Pearson values ranging between 0.04 and 0.81. Second, the optimum time window to analyze the temporal interrelationship between electricity consumption and traffic volume is 8 h. Furthermore, we investigated the spatial relationship between electricity consumption and travel demand. Despite finding complex and unobvious relationships, we detected a global linear relationship between distance squared and R2 values; specifically, that increasing the squared distance by one square kilometer decreases the accuracy of the model by 0.0052. Finally, we developed 239 LSTM models to predict electricity consumption of a zip code using traffic volume from the same zip code and found a range of model performance across the city.

Overall, the idea of the study is novel. The articles listed in the literature review section explore various methods for short-term load forecasting and related applications in the field of energy and transport. While they also discuss applications of these methods in energy management, travel mode choice modeling, and accident detection, none of them explore the spatial relationship between electricity consumption and travel demand. As our study is novel, it cannot be compared with other articles. Nevertheless, we recognize that the interrelationships between traffic volume and electricity consumption are likely influenced by a range of complex and context-specific factors from obvious factors like the presence of alternate travel modes (e.g., transit, walk, bike) to less obvious factors related to household characterisitcs13, and daily10 and seasonal12 effects.

Furthermore, this study has several limitations. In particular, it would have benefited from having access to origin–destination data (not for a typical date but for a specific day to compare energy use patterns) and to more detailed travel volume data (beyond traffic volumes on the expressway system).

In terms of policy implications, this work suggests that policies made to impact one infrastructure system can impact others. For example, many cities have adopted time-varying pricing practices for tollways (e.g., Singapore) and public transport (e.g., Washington DC) to encourage people to avoid rush hour periods and lessen congestion, which must have an impact of electricity consumption (as well as other resources such as water and gas). With the global push toward infrastructure decentralization and distribution65, we recommend better coordination among utilities and transport service providers.

Future work should focus on further understanding these interrelationships, ideally using other more spatially disaggregate datasets. It is aligned with limitations from other research25, which uses a similar methodology and mentions that the exploration of many datasets from distant energy contexts is necessary for a broader understanding of the problem. Another future area of research is on how much data is needed to capture interrelationships between infrastructure systems. For instance, increasing the spatial resolution of the data by collecting information at a more disaggregated level, as well as incorporating data on public transportation usage and other mobility-related variables, could further provide additional insights and improve the accuracy of models.

Material and methods

Long short-term memory (LSTM)

Neural Networks (NN) are one of the most widely used types of machine learning techniques. They are made of three layers: (a) input, (b) output, and (c) hidden. The most common types of NN have a cost function, and the goal is to minimize this cost function through re-adjusting the weights (i.e., model parameters) using a backpropagation technique. Recurrent Neural Networks (RNN) are more advanced and complex models that belong to the family of deep learning techniques. In RNN, a temporal loop connects the hidden layer to itself, meaning that the hidden layer not only impacts the output but also gives feedback to itself. The structure of an RNN model is shown in Fig. 6.

Figure 6
figure 6

Structure of RNN model. Microsoft Visio 2019 was used to draw the visual concept of LSTM based on the Authors’ understanding of the model. https://www.microsoft.com/en-us/microsoft-365/visio/flowchart-software.

In sequence prediction problems, Long Short-Term Memory (LSTM) networks are a specific type of RNN that can learn the dependency in the sequence of time-series data. Since there could be a lag between the events of interest in a time series, these networks can perform well with different types of problem such as classification, processing, and prediction using time series data. One important issue in the standard RNN models is the inefficiency of the model to learn when there are time lags greater than five to ten discrete time steps between the input data target variable that can cause a vanishing gradient—that is, the gradient is too small, preventing the weight from changing its value. LSTMs were developed to cope with the problem of vanishing gradient. They can learn to connect minimal time lags when there exist many discrete time steps by enforcing constant error flow through special units, called cells; see Eqs. (2) and (3). In LSTMs, the flow of information is controlled through gates that keep or override information in the memory cell, forgetting previous information, and deciding how to access memory cell; see Eq. (4). An LSTM consists of three gates. The two gates that learn to open and close access to error within the memory cell are input and output gates. The third type of gate is the forget gate that has a specific role to reset operations for the cells. In another word, the input gate decides how much of the new state h[t] should be updated; the output gate determines the portion of the state that must be outputted; and the forget gate decides the part of the information that needs to be forgotten and eliminated from the previous cell state h[t-1]. The main flow of information happens through a cell state. The cell state is updated in a forward process and the output is computed as displayed in Eq. (5):

$$\text{forget gate}:{\upsigma }_{f}\left[t\right]=\upsigma ({\text{W}}_{f}\cdot x\left[t\right] +{\text{R}}_{f}\cdot \text{y}\left[t-1\right] + {\text{b}}_{f})$$
(2)
$${\text{candidate state}}:\tilde{h}\left[ t \right] = {\text{g}}_{1} \left( {{\text{W}}_{h} \cdot x\left[ t \right]{ } + {\text{R}}_{h} \cdot {\text{y}}\left[ {t - 1} \right]{ } + {\text{ b}}_{h} } \right)$$
(3)
$$\text{input gate}:{\upsigma }_{u}\left[t\right]=\upsigma ({\text{W}}_{u}\cdot x\left[t\right] +{\text{R}}_{u}\cdot \text{y}\left[t-1\right] + {\text{b}}_{u})$$
(4)
$${\text{cell state}}:{\text{h}}\left[ t \right] = {\upsigma }_{u} \left[ t \right] \left( \cdot \right) \tilde{h}\left[ t \right] + {\upsigma }_{f} \left[ t \right] \left( \cdot \right) {\text{h}}\left[ {t - 1} \right]$$
(5)
$$\text{output gate}:{\upsigma }_{o}\left[t\right]=\upsigma ({\text{W}}_{o}\cdot x\left[t\right] +{\text{R}}_{o}\cdot \text{y}\left[t-1\right] + {\text{b}}_{o})$$
(6)
$$\text{output}:\text{y}\left[t\right]= {\upsigma }_{o}\left[t\right] \left(\cdot \right) {\text{g}}_{2}(\text{h}\left[t\right])$$
(7)

where x[t] is the input at time t, σ(·) is a sigmoid function, g1(·) and g2(·) denote the point wise nonlinear activation function, (∙) denotes the entry wise multiplication between two vectors, Ro, Ru, Rh, and Rf represents weight matrices of the recurrent connections, Wo, Wu, Wh, and Wf are weight matrices for the inputs of LSTM cells, bo, bu, bf, and bh are bias vectors5. The LSTM model was developed in Python (v3.7.3) using the Keras (v2.2.4) Deep Learning Library that itself uses TensorFlow (v2.0.0b0) in the backend.

Model execution and validation

In this study, the dataset is split into two groups: the first 22 days of November for training and the last 8 days of November for testing. The groups were not split randomly on purpose to ensure both the training and testing sets had weekdays and weekends. Moreover, it is a common practice when modeling time series to use earlier data for training and later data for testing. The premise is that a good model should be able to capture new, unseen trends. We kept the same practice even our goal is not to develop the best performing model, but to study interrelationships.

Around 250 LSTM models were trained and compared to select optimal hyperparameters. The hyperparameters used in the end are as follows: number of epochs: 200; batch size: 50; learning rate: 0.001, optimizer: Adam; activation function: sigmoid; loss function: Binary crossentropy.

In terms of performance, we use goodness of fit \({R}^{2}\), mean absolute error (MAE), and root mean squared error (RMSE) defined as:

$${R}^{2}=1-\frac{\sum_{i}{\left({y}_{i}-{\widehat{y}}_{i}\right)}^{2}}{\sum_{i}{\left({y}_{i}-\overline{y }\right)}^{2}}$$
(8)
$$MAE=\frac{\sum_{i=1}^{n}|{y}_{i}-{\widehat{y}}_{i}|}{n}$$
(9)
$$RMSE=\sqrt{\frac{1}{n}{\sum }_{i=1}^{n}{({y}_{i}-{\widehat{y}}_{i})}^{2}}$$
(10)

where \({y}_{i}\) is the actual value of a data point, \({\widehat{y}}_{i}\) is the predicted value,\(n\) is the number of data points, and \(\overline{y }\) is the mean value of all \(n\) actual values.

To calculate the correlation between two time-series we use Pearson r value, defined as:

$$r= \frac{\sum (x-{m}_{x})(y-{m}_{y})}{\sqrt{\sum {(x-{m}_{x})}^{2}\sum {(y-{m}_{y})}^{2}}}$$
(11)

where \(x\) and \(y\) are the data points of two time-series and \({m}_{x}\) and \({m}_{y}\) are the mean of the vector \(x\) and \(y\) respectively.