A multi-sensory stimulating attention model for cities’ taxi service demand prediction

Taxi demand forecasting is crucial to building an efficient transportation system in a smart city. Accurate taxi demand forecasting could help the taxi management platform to allocate taxi resources in advance, alleviate traffic congestion, and reduce passenger waiting time. Thus, more efforts in industrial and academic circles have been directed towards the cities’ taxi service demand prediction (CTSDP). However, the complex nonlinear spatio-temporal relationship in demand data makes it challenging to construct an accurate forecasting model. There remain challenges in perceiving the micro spatial characteristics and the macro periodicity characteristics from cities’ taxi service demand data. What’s more, the existing methods are significantly insufficient for exploring the potential multi-time patterns from these demand data. To meet the above challenges, and also stimulated by the human perception mechanism, we propose a Multi-Sensory Stimulus Attention (MSSA) model for CTSDP. Specifically, the MSSA model integrates a detail perception attention and a stimulus variety attention for capturing the micro and macro characteristics from massive historical demand data, respectively. The multiple time resolution modules are employed to capture multiple potential spatio-temporal periodic features from massive historical demand data. Extensive experiments on the yellow taxi trip records data in Manhattan show that the MSSA model outperforms the state-of-the-art baselines.

A smart city is as considered the direction of urban development and the trend of civilized development in the information age 1 . It aims to use modern information technology to promote the interconnection, efficiency, and intelligence of urban operating systems 2 . As an indispensable part of a smart city, intelligent transportation aims to improve the transportation system's operating efficiency and make full use of transportation resources 3 . Taxi is a travel tool that meets people's travel service demand, and it plays an essential role in urban public transportation systems 4 . In recent years, with the continuous improvement of people's living standards and the significant lifestyle changes, there is higher requirements for quality and efficient taxi services. However, there are still some inefficiencies in this mode of transportation. For example, the insufficient supply of taxis in some areas leads to long waiting times for passengers, and the oversupply in some areas causes a large number of online taxis to be empty. These problems not only lead to inefficient taxi cruising, but also affect people's satisfaction with taxi services. Therefore, to better meet the needs of passengers and reduce costs, the taxi management platform should conduct reasonable dispatch of online taxis. At present, with the increasing popularity of floating car technology and taxi services like Uber and Didi Chuxing, a massive amount of public travel data has been accumulated 5 . Making full use of these historical public travel data provides us an opportunity to address the challenges of CTSDP in smart cities 4 , which could rationally dispatch taxis to areas with higher demand and reduce the waiting time of passengers. In addition, it could also reduce energy consumption and air pollution in smart cities. In summary, building an efficient CTSDP model for the massive historical demand data is crucial to constructing a smart traveling service in smart cities.
More efforts have been directed to towards building a more accurate model with historical data for traffic prediction. Data-driven traffic prediction methods are generally divided into three major categories: time-series analysis methods, traditional machine learning methods, and deep learning based methods. The time-series analysis methods and the traditional machine learning methods are insufficient for the complex nonlinear Scientific Reports | (2022) 12:3065 | https://doi.org/10.1038/s41598-022-07072-z www.nature.com/scientificreports/ temporal and spatial relationships in traffic data. Fortunately, deep learning has brought new ideas to capture nonlinear spatio-temporal relationships [6][7][8] , which has aroused more and more related studies. Abundant studies have applied Convolutional Neural Network (CNN) to capture spatial correlations [9][10][11] , and Zhang et al. 12 further proposed ST-ResNet, which is composed of a convolutional layer and a residual unit to simulate the spatial dependence of the city. To capture the time dependence, recurrent neural networks (RNN) and some of its variants, such as long short-term memory networks (LSTM) and gated recurrent units (GRU), have been widely used due to their excellent performance in capturing dynamic time dependence 6,13,14 . For example, Xu et al. 15 encoded the taxi demand data in the past week into a time series and fed the sequence to the LSTM network to learn the time pattern of taxi demand. Although these studies explicitly modeled temporal or spatial dependence, they don't consider both aspects simultaneously. Therefore, to solve the problem of spatio-temporal sequence prediction, Liu et al. 16 further proposed a spatio-temporal network for demand prediction, which integrates local CNN and LSTM networks to learn spatio-temporal correlations simultaneously. Wang et al. 17 also used an encoder-decoder framework to deal with the spatio-temporal relationships of traffic data. Taking into account the shortcomings of a single data source, Zhao et al. 18 proposed a feature fusion model using multi-source data for prediction. It verified the effectiveness of the model using multi-source data sets. Some studies have extended the spatio-temporal model to solve CTSDP which requires higher accuracy. For example, Rodrigues et al. 19 proposed a deep learning architecture combining text information and time-series data and applied this method to regional taxi demand prediction. The existing methods mainly focus on predicting taxi demand in some specific regions. There are few studies which are based on the taxi Origin-Destination (OD) demand in a whole city. One of the significant challenges in CTSDP is to capture the spatial-temporal dependencies between each two OD pairs. Liu et al. 20 proposed a contextualized spatio-temporal network for taxi Origin-Destination demand prediction. The network integrates local spatial context, temporal evolution context, and global context into a framework. Ke et al. 21 proposed the spatio-temporal encoder-decoder residual multi-graph convolutional network (ST-ED-RMGC), which builds multiple graphs to characterize the non-Euclidean pair-wise geographical and semantic correlations among different OD pairs. Chen et al. 22 proposed a method combining spatial-OD and Bidirectional ConvLSTM model with taking the historical and future states of the data into account to extract the time and space characteristics. Although these spatio-temporal deep networks showed reasonable performance for CTSDP, they still show some significant shortcomings: (1) being insufficient to perceive the micro spatial characteristics and the macro periodicity characteristics from the cities' taxi service demand data; (2) lack of consideration for exploring the potential multi-time patterns from these demand data.
In the human perception mechanism, people generally pay attention to the changes in details and the macro variety tendency 23 . Therefore, to meet the above challenges, we propose a Multi-Sensory Stimulus Attention (MSSA) model for CTSDP, which is similar to the multi-sensory stimulus in the human perception mechanism 24 . Specifically, the proposed framework integrates two different attention mechanisms, namely Detail Perception Attention (DPAtt) and Stimulus Variety Attention (SVAtt), for mining the characteristics of historical demand data at the micro and macro levels respectively. Moreover, the framework combines a Local Spatio-temporal Network (LSTN), a Daily periodicity network (DayNet), and a Weekly periodicity network (WeekNet) for exploring potential multiple periodic patterns.
The main contribution of this work could be summarized as follows: • Inspired by the human perception mechanism, we propose a Multi-Sensory Stimulus Attention (MSSA) model, which combines detail perception attention and stimulating variety attention to learn the characteristics of historical demand data from the macro and micro levels. • We combine multiple time resolution modules to capture potential spatio-temporal periodicity features from massive taxi service demand data. • Extensive experiments were conducted on Manhattan's yellow taxi trip records data, and the results show that the proposed MSSA model outperforms the state-of-the-art methods.

Results and discussion
Prediction results and overall performance analysis. We compare the prediction performance of the proposed method with the following baseline methods. These baseline methods include not only time-series analysis methods and traditional machine learning methods but also the state-of-the-art deep learning models. The specific methods are: • Historical Average (HA) Historical Average uses the average values of previous demand at a given location over a relative time interval to forecast demand. • Linear Regression We implemented two typical methods: Lasso 25 (i.e., with l 1 -norm regularization) and Ordinary Least Squares Regression 26 (OLSR). • Multilayer Perceptron (MLP) The multilayer perceptron consists of four fully connected layers. The MLP takes the demand data of the last n time intervals as input and predicts the demand data of the next moment. • XGBoost 27 XGBoost is a robust boosting tree-based method that is widely used in data mining applications.
• ST-ResNet 28 ST-ResNet is a deep learning-based traffic prediction method, which constructs traffic maps of cities at different times in the form of images. • CSTN 20 CSTN is the state-of-the-art deep learning-based method for taxi Origin and Destination demand prediction. The method is modeled from three perspectives: local spatial context, temporal evolution context, and global relevance context. where m is the total number of grid divisions in New York City, R (t) and R(t) are the predicted taxi demand and the corresponding ground truth in time interval t, respectively. These two metrics are the most commonly used to evaluate the prediction accuracy, and their value ranges are [0, +∞) . A higher value of the metric indicates a higher prediction error.
In our work, we respectively predicted the Origin-Destination demand of taxis and the Origin demand of taxis. When evaluating the performance of these two aspects, for convenience, the above two evaluation metrics were denoted as MAPE-OD and RMSE-OD in the taxi Origin-Destination demand assessment, and MAPE-O and RMSE-O in the taxi Origin demand assessment.
Using MAPE and RMSE evaluation metrics, we summarize the experimental results of all methods in Origin demand forecasting and Origin-Destination demand forecasting in Table 1.
The experimental results could be observed that the MSSA model significantly outperforms other competed methods. Among them, the time-series analysis method (HA) offers the worst performance. The traditional machine learning methods (Lasso, OLSR, and XGBoost) are relatively better than the former, and the methods based on deep learning (ST-ResNet and CSTN) have more significant improvements. Specifically, the MSSA method our proposed achieves the lowest MAPE and RMSE on the task of taxi Origin demand prediction and taxi Origin-Destination demand prediction. Moreover, for the Origin demand prediction, the MSSA model achieves 18.29% and 27.66% relative performance improvements over MAPE-O and RMSE-O, compared to the existing best-performing CSTN method. For the Origin-Destination demand prediction, the MAPE-OD and RMSE-OD are also reduced by 5.26% and 5.30% respectively compared to CSTN method.
The time-series forecasting methods (HA) does not perform well because it only relies on historical records and ignores spatial and other contextual features. The traditional machine learning methods (Lasso, OLSR, and XGBoost) further treat spatial correlation as features and rules, which helps to obtain better performance gains over the traditional time series analysis methods. However, these methods generally fail to capture the complex and nonlinear spatial and temporal correlations. The emerging deep learning methods (ST-ResNet and CSTN) showed significant improvement in recent years. However, there are still some drawbacks in the ST-ResNet and CSTN, which lead to the limitations of further improvement. Generally, ST-ResNet ignores learning the temporal evolution context, and CSTN ignores the influence of an obvious pattern (periodicity). In addition, these methods are not sufficient to perceive the micro spatial characteristics and the macro periodicity characteristics from the cities' taxi service demand data. Compared with these methods, the MSSA model further integrates detail perception attention and stimulating variety attention to learn the characteristics of historical demand data at the macro and micro levels and then combines multiple time resolution modules to capture various potential spatio-temporal periodicity features.
Performance analysis on different days. We could compare the prediction performance of all methods on different days by dividing the dataset into 7 categories according to different days of the week and predicting separately. As shown in Fig. 1, we only compare the performance of traditional machine learning methods and deep learning methods on the metric MAPE-OD because the performance of time series methods is poor and the performance comparison of other metrics is similar to MAPE-OD. We can conclude from the figure that our proposed method outperforms all baseline methods on different days of the week. In addition, the model could achieve lower MAPE-OD values on weekdays than on weekends. We can also conclude that the prediction per- www.nature.com/scientificreports/ formance of weekdays is better than that of weekends. The underlying reasons is that the taxi trips on weekdays are more regular, which is beneficial to the model prediction, while the trips on weekends are more random, which increases the difficulty of the model's prediction.
Effectiveness of multi-sensory stimulating attention. This section discusses the impact of multisensory stimulating attention on the performance of taxi demand prediction. We trained a network without multi-sensory stimulating attention and got the average of its experimental results through multiple training.
The comparison results are shown in Table 2.
The comparison results show that the MSS model without multi-sensory stimulus attention achieves a MAPE-O of 16.98% and a RMSE-O of 15.29, and it also gets a MAPE-OD of 26.67% and a RMSE-OD of 1.274. After adding the multi-sensory stimulating attention, the performance of the MSSA model with the multi-sensory stimulus attention will be improved with 11.07% and 6.08% over metrics MAPE-O and RMSE-O respectively. It also will be improved with 2.77% and 1.26% over metrics MAPE-OD and RMSE-OD respectively. The experimental results show that the multi-sensory stimulus attention could improve the model's prediction performance.
Influence of sequence length. To explore the influence of sequence length in the LSTN network, we investigate the relationship between sequence length and prediction performance by training our model with sequences of different lengths. The relationship between the sequence length and the Origin demand prediction performance is shown in Fig. 2a, and the relationship between the sequence length and the Origin-Destination demand prediction performance is shown in Fig. 2b.  www.nature.com/scientificreports/ As shown in Fig. 1, the prediction performance of model shows an increasing tendency with the increase of sequence length (the lower the value of the evaluation metric, the better). The sequence length offers an optimal value at the length of 5 (2.5 h). It's noteworthy that the prediction performance begins to decline when exceeding the optimal value. The upward trend of the prediction performance with increasing sequence length indicates that the model is sensitive to the variety tendency in the same period, and the decline when exceeding the optimal value means that more parameters need to be learned when extending to a more extensive time range. The sequence length was set to 5 in the MSSA model to make the training convergence more quickly.
Effect with different periodic modules. To further explore the impact of different potential periodic modules on the overall performance, we integrate different periodic modules to implement the following variants: • LSTN + DayNet The network only contains local spatio-temporal modules and potential daily periodic modules. The outputs of the two modules were fused and fed into a convolution layer for predicting taxi demand. • LSTN + WeekNet The network contains only a local spatio-temporal module and a potential weekly periodicity module. The outputs of these two modules were fused and fed into a convolutional layer for predicting cab demand. • LSTN + DayNet + WeekNet As a complete version of our proposed model, this network incorporates a local spatio-temporal module, a potential daily periodicity module, and a potential weekly periodicity module to predict taxi demand.
The prediction performance of different variants on taxi demand is shown in Table 3. The comparative analysis shows that both the daily and the weekly periodic modules are beneficial for prediction performance, and the integrating multiple periodicity modules shows better prediction performance.

Methods
Taxi trip records data. The experiments were conducted on the taxi travel record dataset provided by NYC open data 29 , which is mainly distributed in Manhattan. We selected the taxi travel record data for the whole year of 2014. The data of the last 60 days was employed as the test set, and the rest was used as the training set.
In addition, we analyzed the temporal and spatial characteristics of urban residents' travel. We firstly divided the travel records on January 10, 2014 into 8 time periods and then drew the travel heat map of each time frame. In the heat map, the darker color indicates that there is a high demand for taxis in this area, and different colored lines represent popular travel routes from different high-demand areas. From the results shown in Fig. 3, it could be learnt that the cities own different demand of taxi service in different time period. It's a challenge to capture these periodical patterns from massive taxi service demand data, and we aim to address this challenge in this work.
Data enhancement. To achieve data enhancement through multi-source information fusion, we collected meteorological data from open-source websites 30 . The collected and selected meteorological indicators are shown in Table 4, including 23 discrete weather types and 6 continuous weather indicators.

Data preprocessing.
In the spatial dimension, we divide the Manhattan area into H × W grid areas, as shown in Fig. 4, and numbered them as g 1 , g 2 , . . . , g G , where G = H × W . In the time dimension, we divided the time evenly into a sequence of time intervals, denoted as {t 1 , t 2 , . . . , t T } . Based on the time stamp and location coordinates of passengers getting on and off the taxi from the taxi trip record, we counted the number of trips between each area and then generate the taxi OD matrix R(t) within the time interval t. We denoted it as R(t) ∈ R G×G .
To better maintain the continuity of data distribution and better capture local spatial features, we further carried out a simple transformation of the dimension of the matrix. That is, transformed from the original R(t) ∈ R G×G to R(t) ∈ R G×H×W . In addition, we filtered the samples with demand values less than 10 to improve data quality 31 .
For the meteorological data, we first digitized the 23 discrete types of data with one-hot encoding and then converted the 6 continuous types of data into the range of [−1,1] with Min-Max linear normalization.

Model framework overview.
This section provides details about the MSSA model, which aims to predict the Cities' Taxi Service demand R (t) for possible upcoming trips by using nearby historical demand data R close , daily periodic data R day , weekly periodic data R week , and the heterogeneous data ǫ . Figure 5 shows the Table 3. Comparison of different periodicity modules. These three sub-networks separately model different periods of historical demand data to analyze the three time modes: closeness, daily periodicity, and weekly periodicity. What's more, the Detail Perception Attention (DPAtt) learns the spatial and meteorological characteristics of historical demand data at the micro-level, and the Stimulating Variety Attention (SVAtt) learns potential periodic patterns from the macro level.

MAPE-O (%) RMSE-O MAPE-OD (%) RMSE-OD
Local spatial feature learning. In spatio-temporal traffic data, the traffic observations at spatially adjacent locations are often not independent but are strongly correlated. The same is true for taxi demand, which tends to have more similar demand patterns in spatially adjacent regions. To extract local spatial features of taxi demand from the Origin view and the Destination view, we input the taxi OD demand matrix R(t) and DO demand matrix R(t) T into the 3D convolutional layer, respectively, which has 16 convolutional kernels with the kernel size of 3 × 3. Formally, where w o and w d are learnable parameters. The residual learning proposed by He et al. 32 was applied in our model to further explore detailed spatial features with limited computing resources. In our work, the residual mapping consists of the combination of activation and convolution once. We stacked L residual units upon Conv3D as follows:     where ⊕ denotes feature concatenation operation and w od is the parameter of the fused convolutional layer. Since our task is to predict the future demand of taxis in each city grid, the network structure didn't employ any downsampling and pooling operations in the spatial dimension. This is because these operations tend to reduce the size of the tensor which makes the network less sensitive to local spatial features.

Heterogeneous information fusion.
Taxi demand may be affected by complex external factors like road network connections, points of interest, and weather conditions. Considering the potential strong correlation of meteorological information on taxi trips, we focused on the impact of meteorological data. We first encode them using a multilayer perceptron (MLP). Then, we concatenate the encoded meteorological features with the local spatial features and fused the two features with a convolutional layer with 32 filters. Formally, where ⊕ is the feature concatenation. F w denotes the meteorological feature after coding and w odw denotes the parameters of the convolutional layer.

Detail perception attention.
Recently, attention mechanism has become an essential part of neural network structure and has a large amount of research in different fields. In particular, Woo S et al. 33 proposed a convolutional block attention module (CBAM) to enhance feature representation for visual semantic analysis.
Inspired by the attention in CBAM, we proposed a Detail Perception Attention (DPAtt) to learn the detail variation of fused features fully. To enhance the feature representation of local spatial features and meteorological features, DPAtt is constructed with a level attention module, and a spatial attention module, in which the level attention module mainly processes the feature map of different taxi demand levels and the spatial attention module mainly reinforces the demand density distribution on the feature maps. The level attention and spatial attention in DPAtt make it possible to pay more attention to the vital demand levels and the critical regions in the feature maps.
As shown in Fig. 6, the level attention does max-pooling and average-pooling for the feature maps in level dimension to obtain Maxpool level attention vector and AvgPool level attention vector. We then input these two vectors into a single-layer perceptron with shared weights to obtain two new level attention vectors. We merged these two vectors using element-wise summation and finally multiplied with the original feature map to obtain the new feature map, which is expressed as: where ⊗ denotes the multiply operation, and σ denotes the sigmoid function.
To compute the spatial attention, as shown in Fig. 7, we first applied average-pooling and max-pooling operations along the level axis to obtain two spatial attention maps and concatenate them. Afterwards, a convolutional layer and an activation layer were employed to generate the spatial attention weight matrix. Finally, we multiplied the matrix with the input feature map to obtain the weighted feature matrix, which is expressed as: where ⊗ denotes the multiply operation, and σ represents the sigmoid function. of taxi demand, we introduced the ConvLSTM 34 network, which helps establish time-series relationships from two-dimensional plane data and extract spatial relationships. The difference with LSTM is that the convolution is employed as the operation between the input and each gate, which helps to extract both temporal features and spatial features. The operation could be formulated as follows: where i t , f t and o t denote input gate, forget gate, and output gate, respectively. X t and c t denote the input and cell state of the network at time t, respectively. Each w and b represent the weight and bias of each gate respectively. * denotes the convolution operation, ⊗ means the Hadamard product and σ is the logistic sigmoid function. We input the temporal closeness subsequence F s c = [F s (t − n), F s (t − n + 1), . . . , F s (t − 1)] into the Con-vLSTM network to generate local spatio-temporal features F close . Formally, where w close are all learnable parameters.
Potential temporal periodicity feature learning. Due to the regularity of people's daily life, traffic data usually show apparent cycles and trends. However, most researchers currently model the taxi demand forecasting problem using data from only a few intervals (typically several hours). They ignore a vital property of spatiotemporal prediction: long-term dependence (such as potential periodicity) 35 . Therefore, this section will focus on the learning method of the multi-period mode.
(1) Potential daily periodic feature learning The demand for taxis in the area around times square for four consecutive days is shown in Fig. 8. It could be observed from the figure that there is a significant similarity in demand for taxis at the same time on different (13)    where p is the total number of time intervals in a day, and s p is the number of periods of potential daily periodicity.
In the DayNet module, we first concatenate the sequence R day in the time dimension, and then use a 3D convolutional layer to extract features.
where w day is the learnable parameters.
To perceive which day had the most significant stimulus (i.e., relevance) to our prediction, we further constructed a SVAtt module. (The details of this attention will be expanded in the next section). We input the extracted feature F p day into SVAtt.
(2) Potential weekly periodic feature learning Similarly, we drew a heat map of the demand situation in the New York times square area for four consecutive weeks, as shown in Fig. 9. From the figure, we can conclude that the taxi demand pattern also has significant similarities at the same time in different weeks. We denoted the weekly relative time interval demand sequence as: where q is the total number of time intervals in a week, and s q is the number of periods of potential weekly periodicity.
In the WeekNet module, we use the same approach as in the DayNet module. That is, convolutional network and SVAtt were used to capture the potential weekly periodic patterns.
where w week is the learnable parameter.

Stimulating variety attention.
In the previous section, when learning the potential daily periodic features, the contribution of the demand of the previous s p days to the forecast was not equal. For example, the impact of the same time for yesterday on the prediction performance may be more significant than that of the day before yesterday, or because of travel restrictions such as car restriction in some areas, people's travel patterns may become more similar on alternate days. Similarly, we believe that similar travel patterns also exist in potential weekly cyclical features. To solve the inconsistent contribution of daily (weekly) data to predictive performance, we propose the SVAtt, which could assign different weights to daily (weekly) demand from two aspects, potential daily periodicity, and potential weekly periodicity.
We detailed each step of the SVAtt module as follows: We first input F p day into a fully connected layer with s p neural units, and then activated it with Softmax to obtain the attention weight vector w day .  Similarly, for the potential weekly periodic feature learning module, we use the same method for processing, and the processing process could be expressed as: where the fully connected layer has s q neural units.

Features fusion.
This section discusses the fusion method of the three components of local spatio-temporal features, potential daily periodic features, and potential weekly periodic features. As shown in Fig. 5, we denoted these three parts as F close , F day and F week , respectively. It should be noted that different regions are affected by these three components, but the degree of influence may be different. For some regions, closeness dependence is often particularly significant, while for other regions, long-term dependence is more important. Therefore, when fusing the above three components, we fully considered the different contributions of these three components from historical data.
where ⊗ is the element-wise multiplication, w rc , w rd and w rw are learnable parameters that reflect the degrees of the closeness influence, the daily period influence, and the weekly period influence on the predicted target.
Finally, we input F res into a convolutional layer and an activation layer to obtain the taxi OD demand prediction value, R (t) at the time interval t, which is expressed as: where f is the tanh activation function, and w res is the learnable parameter. Implementation details. We divided Manhattan into a 15 × 5 grid map based on the longitude and latitude. That is, H is 15 and W is 5. We implemented our model with the Tensorflow-2.5 framework on NVIDIA 3080 GPU. The input to the model consists of eleven historical observations, including five temporal closeness components, three daily periodic components, and three weekly periodic components. That is, n is 5, P is 3, and Q is 3. In the LSTN module, the number L of residual units was set to 2, and all convolution layers in ConvLSTM had 32 filters. For the DayNet and WeekNet sub-network, the convolutional layer had 32 filters and s p , s q were 3 respectively. For the whole model, the batch size was set to 32, the learning rate was 0.001 in the pre-training model and 0.0001 in the later training, and early stopping on the validation dataset was employed. We used Adam 36 optimizer for training to minimize Mean Squared Error.

Conclusions
To address the challenge of taxi service demand prediction in smart cities, we propose a Multi-Sensory Stimulating Attention (MSSA) Model for CTSDP. Like the human perception mechanism, the MSSA model integrates detail perception attention and stimulating variety attention to learn the characteristics of historical demand data at the macro and micro levels. The multiple periodical modules were combined to capture potential spatiotemporal periodicity features from massive taxi service demand data.
Extensive experiments and evaluations were conducted on the taxi travel record dataset in Manhattan, New York. The results showed that the MSSA outperforms the baseline methods. Specifically, compared with the state-of-the-art methods, the MSSA reduced the MAPE and RMSE by 18.29% and 27.66% in the Origin demand forecasting, and 5.26% and 5.30% in the Origin-Destination demand forecasting, respectively. Further analysis showed the effectiveness of multi-sensory stimulating attention and multiple periodicity feature learning.
The MSSA could be applied in smart cities to improve traveling service quality and also help to reduce pollution emissions. In the future, the MSSA could also be extended by learning more periodic patterns and incorporating more context information.

Data availability
The yellow taxi trip records data related to this study is accessible using the following links: https:// www1. nyc. gov/ site/ tlc/ about/ tlc-trip-record-data. page. The meteorological data related to this study is accessible using the following link: https:// www. wunde rgrou nd. com.