A unified spatial multigraph analysis for public transport performance

Public transport performance not only directly depicts the convenience of a city’s public transport, but also indirectly reflects urban dwellers’ life quality and urban attractiveness. Understanding why some regions are easier to get around by public transport helps to build a green transport friendly city. This paper initiates a new perspective and method to investigate how public transport network’s morphology contributes significantly to its performance. We investigate the significance of morphology from the perspective of graph classification – by extracting the typical local structures, called “motifs”, from the multi-modal public transport multigraph. A motif is seen as a certain connectivity pattern of different transport modes at a local scale. Combinations of various motifs decide the output of graph classification, particularly, public transport performance. We invent an innovative method to extract motifs on complex spatial multigraphs. The proposed method is adaptable to unify complex factors into one simple form for swift coding, and depends less on observational data to handle data unavailability. In the study area of Beijing, we validate that the measure can infer varied public transport efficiencies and access equalities of different regions. Some typical areas with undeveloped or unevenly distributed public transport are further discussed.


List of Figures
As shown by Eq. 3 in the main text, the θ (·) is a key parameter of edge weight. Intrinsically it is the proxy of the average travel speed of a certain mode. Calculating the travel speed of a certain mode is tricky since the waiting time at a stop should be excluded, and the travel speed is heterogeneous over space due to road conditions and configurations. As an approximation to up-to-date routes and travel time, we applied Amap API 1 to query the travel distance and time between selected subway stations. The estimated travel time queried from Amap API is regardless of the traffic situation at different time of day. In order to calculate the travel speed of each pure mode for the same route, walking time and distance to bus stops are deducted from bus travel time and distance. The selected subway stations all have proximity to bus stops. We sampled the stations along a few subway lines. For each line, the selected station pairs all share the same start station S 0 , while end at each consecutive station S n with n (n = 1, 2, 3...) stops away towards one direction. This operation allows us to do differential calculation to tease out the waiting time at a station, e.g., using time of S 0 S n minus S 0 S n−1 as the travel time of S n−1 S n (n > 2). Corresponding travel distance is calculated in the same manner so that we can estimate the travel time. Table 1 displays an example (Subway Line 4) of the sampled travel time and distance to approximate θ (·) , starting from East Gate of Peking University (S 0 ), going through Zhongguancun (S 1 ), Huangzhuang (S 2 ), Renmin University (S 3 ), Weigongcun (S 4 ), and ending at National Library (S 5 ).
We observed that the number of stops on a route drastically affects the estimated travel speed, especially for bus. The overhead costs such as boarding and alighting, walking in a station, are more significant for shorter trips. Buses are more prone to the influence since they take longer time to accelerate and decelerate than subway and normally have closer stops. Table 2 is the differential result between each entry from the second and the first entry (S 0 S n − S 0 S 1 (n > 1)) in Table 1. This operation removes the overhead cost at the origin stop of each trip. θ is estimated by the average travel speed of each mode. Since only the ratio between different modes matters, we let θ F = 1. According to Table 2, we let θ B = 2.5 in spite of its variance in shorter distance. θ S = 6.7 is also inferred from the table.
As a comparison, we also let θ B = 1 and θ S = 6.5, testing the scenario when the travel speed takes into account the overhead cost at each stop, e.g., waiting for boarding or alighting, acceleration and deceleration. The second half of Table 3 shows the result from such setting. This is the lower bound of speed derived from Table 1 where the last column is the averaged speed considering mixed mode as well as multiple overhead cost. Apparently the average travel speed by subway and by bus will be lower for shorter distance, e.g., S 0 S 1 , than for longer distance. Bus is even slower than walking for S 0 S 1 considering the time of waiting. Since Table 2 calculates the travel speed after teasing out the overhead cost of first/last mile walking and waiting, it yields the upper bound ratio of θ B /θ F and θ S /θ F .

The complexity of spatial graph classification
This section justifies why we invented the convolution-based method for graph classification. Combinatorial complexity is a big challenge to compute frequent subgraph patterns in graph boosting [1][2][3][4] . Traditional graph boosting requires enumerating appeared subgraph patterns. For a binary 5-dimensional vector the possible statuses of each node is 2 5 = 32. Due to the dependency between dimensions, only 16 statuses are possible in the case study. If a stop is none of the specified transport stops/stations, it cannot be a transfer stop/station either (Eq. 2 satisfies t b ≤ s b ,t s ≤ s s ); and a node has to be either a bus stop or a subway station (s s + s b ≥ 1 in Eq. 2) since we do not consider nodes with only the role of road intersection. Let the degree of a node be d, thus alter nodes' combinatorial statuses sum up to 16 d . If linkage status is discrete with 3 possible statuses, e.g., the three modes as bus, subway, and walk, then the 1-degree connections of a node jointly yields 3 d combinations. In total (16 d+1 )(3 d ) (d ∈ N + ) potential subgraphs emerge for only one node's 1-degree neighbourhood. In our case, it is more complicated since the link weight can have numerical continuous values or the possibilities of link weights are from an unpredictable infinite set. It is very challenging in that sense to clearly define a subgraph. Geography has the challenge of high complexity in human society. We attributed the complexity of analysing a spatial network to the following points: • Spatial network is mostly a multigraph in which multiple types of links (i.e., interactions between places such as traffic flow, social network connection, etc. 5 ) can simultaneously exist. Solving the tensor adds complexity to solving a matrix. For example, place A and place B can at the same time have interactions of traffic flow, bank transactions, and social network connections. Treating the networks independently and separately is not ideal if the research purpose is to find the association across different layers of networks.
• The topological structure of a spatial network can appear so complex that enumeration of motifs are unlikely to conduct. The comparison of spatial networks in different places as well as scales is challenging to perform.
• Applying neural network based method for spatial networks is more challenging than for the networks in fields such as biomedical sciences or online digital behaviours due to an equal (if not higher) level of complexity while higher difficulty in collecting observation data of real-life scale.
We are hence inspired by the capability of convolution to handle continuous edge weight, but avoided the parameter training mechanism due to the very small amount of observations. This is also the base of argument for Section 1 to choose a suitable θ .

6/16
Fig .1 shows the workflow to retrieve the PT travel time and distance as ground truth. We called Amap API for this analysis instead of using the bus timetable because bus schedules are not accurately released in China and buses mostly are not running on schedule either. We argue that calling Amap API is generally more reliable.

Clustering of graphs by travel efficiency
Clustering of graphs by travel efficiency categorizes graphs based on their observed performance. Labels resulted from this clustering are for the output side of graph classification in the main text ("subarea labels" in Fig.3D). Here we demonstrate the results for three sizes of areas, 3km, 4km, and 5km diameter areas. In the main text we only discussed the 4km area cases. The travel efficiencies in each subarea -t andd -have a similar distribution to a log-normal distribution at the tail part, but with a minor peak to the left where x = 0 (Fig 2). Note that due to kernel density estimation in visualization, the curves look like extending into negative values but in fact do not. Each line corresponds to a subarea. We did not perform a log-normal transformation because we noticed the heavy tail outliers are a distinguishable character of a subarea. Subareas are clustered by its (µt , σt ) or (µd, σd) with Supporting Vector Classifier (SVC). We selected SVC because it is an efficient method that is able to handle non-linear classification and works pretty well for a small number of samples (in our case, subareas). The number of targeted classes are specified as three not only to guarantee a high Calinski-Harabaz score (52.89 fort and 110.60 ford) -the score to measure the goodness of clustering when the real cluster label is missing -but also to yield a small amount of clusters for the easiness of human interpretation.

The influence of subarea size and times of convolution
We noticed the difference in the prediction results for different buffer sizes. Table 3 displays the prediction accuracies from graph classifications by the buffers of 3km, 4km, and 5km diameters (by conducting the operations in Section 4). The upper half is yielded by the default setting with θ S = 6.7, θ B = 2.5, θ F = 1, while the lower half is from the setting of lower bound θ as explained at the end of Section 1.
The influence of subarea size The best prediction happens mostly at 4km diameter. We think 4km can evenly cover the whole study area while without too much overlap. The 3km areas in general display the lest capability of prediction. The reasons behind may be attributed to multiple factors. Within 3km areas, travels short as 1km or less may not worth taking public transit at all given the overhead cost of waiting and detour. The queried results from Amap API for the very short travels thus either involve a bigger portion of walking trips even we queried "by public transit", or yield no feasible solution by public transit since walking will be faster. For 5km diameter areas, the significant overlap between subareas makes the classification difficult.
The prediction capability differs between the adjusted PT travel time and PT travel distance. Reading the result from Table  3, for 3km and 5km areas, the prediction on travel distance is better than for travel time, while for 4km areas they relatively make a tie. Travel time is more difficult to predict due to the heterogeneity of operational frequency of public transit services, the road congestion situation, and the unknown overhead cost induced by times of transfers. These factors cannot be modelled by the convolution based method. On contrast, 4km and 5km areas generally can predict PT distance better than applying the benchmark aggregative statistics method (specified in Section 7). The results from 3 times of convolutions to 4 times of convolutions yield a big jump or drop at all scales for PT distance. The reasons behind are not yet clear and are open to further exploration. The sensitivity of the parameters to the results need a systematic test in order to gain clear insights.
The influence of times of convolutions. We also conclude that twice convolutions are already sufficient to represent PT travel time and distance for short distance travels. From the upper half of Table 3 we see that the peak accuracies vary by buffer sizes and by clustering measures (time vs. distance). However, the general trend is that peak values all come no more than 5 times of convolutions. Having more times of convolutions involves the connectivity of larger neighbourhood. Nevertheless, the more degrees of connections a node from an ego node, the more diluted its influence in the convolution process. Compare twice convolutions with five convolutions, the results are not significantly distinguished from each other. We justify the finding by such an intuitive explanation: by research design, setting out from an OD point, the first lag of connections are covered by walking, and the second lag by PT. Intuitively the total travel time is heavily affected by the travel mode of the second lag, because the speed of the mode plays a big role. For instance, if the OD is near a subway station, the second lag of travel is by subway; while if the OD is near a bus stop, then the second lag of travel is by bus. In our experiment setting, travelling by bus is likely to be slower than by subway since the travels are of short distance and thus we usually do not need to consider as many transfers as for long distance travels (transfer leads to overhead cost). If the ego node is a bus stop or a subway station, then the first lag of travel will be of the corresponding PT mode rather than walking. In this case, the connected second degree node depicts more of the convenience of transfers and connectivities. In such a short distance, the load of information by twice convolutions is sufficient.
It is noteworthy that the lower half of the table, i.e., when the travel speed of PT has lower speed, demonstrates a delay of the highest accuracy compared with the upper half of the table. PT efficiency is a comprehensive result of local connectivity, travel speed, travel distance, operational frequency, and road traffic situation. Excluding the last two, which fall out of the capability of the convolution based index, the rest three play a coalesced role. Since bus has no speed advantage over walking in this scenario, more times of convolutions improve the PT efficiency by harnessing the benefit of local connectivity.
We would also like to point out a negative observation -the tendency of homogeneity in the result when the times of convolutions is very high. We tested 20 times of convolutions to find that the node measures converged to the same cluster over the whole graph. In that case, the local neighbourhood of nodes that are involved in node propagation is so wide that the graph is smoothed out. We hence argue that the size of the neighbourhood, i.e., controlled by the times of convolutions, should be capped at a limit. Too many times are not favourable. Fig. 3 shows the subarea clustering result by different PT efficiencies -t andd, where each node has two dimensions µ (mean) and σ (variance). Correspondingly, the overall spatial distribution of PT travel time, i.e., subarea clusters byt, of the 4km subareas is demonstrated by the colorful circles in Fig. 4. We did not show the spatial distribution patterns of other subarea sizes nor for travel distance since we focused on the travel time of 4km areas in the main text. The similar method can be applied for the omitted cases. The influence of different buffer sizes is discussed in Section 5. Looking at Fig. 4 and Fig. 3(c) 9/16 Table 3. Graph classification accuracies** θ S = 6.7, θ B = 2.5, θ F = 1 Diameter

Convolution times
Clustered by By agg. stats  Normally the most accessible areas are distributed in the central and northern parts of the city, whereas the less accessible ones are located at the southern part and some peripheral areas. This matches the fact that the south edge of Beijing is less developed in terms of commercial and residents' daily activities. The south part, where there are a few warehouses, long-distance shuttle bus stations, and railway stations, takes the function of logistics and warehouse. There are also a few industrial parks over there. The most accessible areas normally have an evenly distributed PT network, including subway lines, sufficient bus lines, and dense road networks. The "evenly distributed" is said with regard to the road distribution. Since we calculated the PT efficiency by normalizing with the shortest network distance, the influence of road network density will affect its PT efficiency. As the main text points out, the top left corner has an orange circle area (as D in Fig.2) that is clustered as high mobility. The area has no subway line at all, but there is a big area of parks and natural reserves that has no roads either. As a result, the bus stops are distributed densely along roads and thus the overall mobility is high.
We clustered the PT efficiency from two dimensions: mean and variance -(µt , σt ) if travel time or (µd, σd) if travel distance. High variance means the fairness of mobility is low, i.e., the points do not provide relatively equal mobility. With travel time of 4km diameter as an example, in Fig.4, blue-circle areas have some convenient lines going through, e.g., subway lines, while other parts less facilitated with PT even though the roads are dense (thus the demands are dense in space). However, a high variance does not tell whether the area in general is of higher or lower mobility. When an area is very sparsely facilitated with PT lines (e.g., gray circles), the fairness can be high, i.e., no one has good accessibility to PT. Whereas when and area is densely distributed with PT lines (e.g., orange circles), the fairness is high as well, and all with good accessibility. We omitted the discussion for travel distance but present the spatial distribution of travel distance in Fig.5. Road sample points (ODs) are dominated by subway stations and bus stops given the same clustering standard, so we cannot see the difference among the sample points in their accessibilities. To scrutinize more details, we particularly clustered only OD points, yielding the accessibility pattern (see Fig. 6). Discernibly, red OD points indicate the advantage of proximity to subway stations on foot. Yellow ones are a bit far off. Whereas most of the road sample points are in gray, which are only assessed with low accessibility since they are off the stations/stops and only walking is feasible to cover that first/last mile.

Predict PT Efficiency with Aggregative Statistics
We compared our multigraph based method with the traditional benchmark index-based method in their capabilities of predicting PT efficiency. We calculated 8 aggregative level statistics for each subarea which were built into an 8-dimension 13/16

14/16
vector representation for one subarea. The 8 variables include the average lengths of bus routes and subway lines, the numbers of bus stops and subway stations, the average distances to the nearest bus stop and the nearest subway station, the average POI density on each road segment, and the average total road length over the area. The vector was utilized to predict labels lt and ld respectively. The last column of Table 3 shows the result. For predicting PT travel time, the aggregative statistics are slightly but not significantly higher than the convolution based index, while for predicting PT distance, the convolution based index wins over.