A Multilayer perspective for the analysis of urban transportation systems

Public urban mobility systems are composed by several transportation modes connected together. Most studies in urban mobility and planning often ignore the multi-layer nature of transportation systems considering only aggregated versions of this complex scenario. In this work we present a model for the representation of the transportation system of an entire city as a multiplex network. Using two different perspectives, one in which each line is a layer and one in which lines of the same transportation mode are grouped together, we study the interconnected structure of 9 different cities in Europe raging from small towns to mega-cities like London and Berlin highlighting their vulnerabilities and possible improvements. Finally, for the city of Zaragoza in Spain, we also consider data about service schedule and waiting times, which allow us to create a simple yet realistic model for urban mobility able to reproduce real-world facts and to test for network improvements.


Introduction
Multiplex networks 1 are useful representations of systems in which the same set of nodes may be connected by different types of relationships.Examples of systems that can be modeled as multiplex networks include social networks, transportation systems with multiple transportation modes or biological systems in which different types of interactions are accounted for 1 .In a multiplex network, nodes and links are grouped in layers according to their nature.Layers can be interdependent and they contain information which would be lost if we only considered the corresponding aggregated network.It has also been shown recently that different types of dynamics that are run on top of multilayer systems also provide new insights into the problems being modeled [2][3][4] .
As discussed in previous papers 5 most available studies on urban transportation consider either one single transportation mode or many modes but all merged in one aggregated network.Thus, the introduction of the new framework of multiplex networks for the analysis of urban transportation systems might allow to better understand complex issues like how to accurately account for the interplay between different transport modes.However, even though few works started to use a multiplex representation to study failures 6 and efficiency 5,7 in transportation systems, they still represent isolated cases.For instance, there are very recent studies that rely on a complex notation to incorporate multiple modes 8 , but that simply aggregates the whole network, thus losing information regarding transfer times 9 .Very up-to-date reviews where the term multilayer is either not present or used in a completely different way 10,11 can also be found in the specialized literature.
The few previous studies on urban transportation systems as multiplex networks focus on addressing their multimodal nature, considering each layer as a transportation mode, to study their resilience 6 or their coupling 12 .In this way, all the lines of the same mode (i.e.buses, metro and tram) are aggregated in a sort of superlayer.This representation is extremely compact -with only few layers-but it totally neglects transfer and waiting times between lines of the same mode that could eventually lead to a wrong estimation of shortest paths or travel times.Another solution is to consider each line of each mode as a single layer.While in this case we can preserve transfer times and synchronization between stops on different lines, it is not possible to quantify the importance of a transportation mode for the mobility in the system.
To reconcile both approaches, in this paper we propose an urban transportation model based on multiplex networks where both representations are used to extract different information from the system.We show that superlayers are fundamental to study the interdependency and resilience of the system while to have a realistic modeling of human mobility the single line per layer perspective should be adopted.We test our model using 9 different urban transportation networks raging from small cities of few hundred thousand inhabitants to megacities like London or Berlin.Finally, for a medium size city we will introduce detailed data about schedule and transfer times to create a more realistic dynamical scenario to test against real-world experimental data and facts.The remaining of the text is organized as follows.First, we give an overview of our multiplex representation in the Methods section and then use it to study the structure of 9 urban transportation networks in the first subsection of results.Finally, in the second part of the Results section, we focus on one case study (city) for a deeper analysis.Specifically, we test if the model is able to reproduce experimental data and check for its possibilities regarding the study of service disruptions and network improvements.

Methods
We start considering each line of each mode of transport as a single layer.Each stop will be a node and there will be a weighted link between two nodes on a layer if the corresponding line passes through both of them, being the corresponding geographical distance between them the weight.Although the same bus stop may be present in multiple layers, allowing transfers between them, this might not be the case between layers of different modes.To solve this problem, we will connect (with inter-layer links) each node of one mode to the closest one of each of the other modes as long as the distance between them is less than 100 m.
In the second part of the results section, however, we will follow a slightly different scheme to add more real features to the model.In this case, we will add a new layer which represents the land to introduce the possibility of moving through the city by walking.To create this layer we first took the population density grid of the European Union 13 , which is composed of roughly 100 × 100 m cells, and extracted those inside the area of interest.Then, we took those cells which had a population density greater than zero and set a node in the middle of them.Finally, we linked together nodes belonging to neighboring cells and added the corresponding weight (distance in meters).To connect this layer with the rest of the system we simply determined which cell each stop belongs to and establish a link between that stop and the corresponding node of the land layer.Once this was done we added, again, the distance between stops as a weight to their links.This distance is simply the geographical distance in the case of tram and metro, but for bus stops it was calculated taking into account street patterns using the Google Maps API 14 .In this way, the second model is much more powerful, allowing us not only to check the validity of the conclusions extracted from the first analyses in a more realistic scenario but also to study more complex phenomena as service disruptions.

Structure of urban transportation networks
In table 1 we present the principal characteristics of the networks we are going to analyze.Data were obtained from a variety of sources, from each company's website to city's open data portals, and then arranged as described in the model discussed in Methods.As we can see they are very different in size and composition.For example, while Vitoria has a population of roughly 250, 000 individuals and a small transportation network consisting of 302 nodes and 16 layers, London is one of the biggest cities in Europe with more than 8 • 10 6 inhabitants and a network consisting of 19, 459 nodes and 555 layers.
We start our analysis with one of the most basic measures of graph theory, the degree.In multiplex networks it can be defined in multiple ways.A straightforward approach is to consider the degree of node i as a vector k k k i of length M (the number of layers) where each element j represents the degree of node i in layer j.However, in this particular case this measure does not provide much information because, as each layer represents a single line, if a node is present in one layer it will have degree 2 in that layer (or 1 in special cases as the first/last stop of a line, although in our networks both stops are always the same one).What we can do is to examine the overlapping degree, o i , which is simply the sum of the elements of k k k i 15 (Figure 1).As we can see, although cities are very different from each other their overlapping degree distribution is quite similar: most of the nodes are only present in one layer (o i = 2), some are present in two layers (o i = 4) and only a few can be found in three or more layers.Given the nature of the system one could expect a low level of overlapping between layers.However, what is really interesting is the fact that the maximum overlapping is quite similar in all the networks, even though their number of layers (and thus their theoretical maximum overlapping degree) differs a lot.Similar results are found if we look at the edge overlap distribution, where the edge overlap, o i j , is defined as the number of layers where a link between nodes i and j exists 15 .This kind of universal behavior can be understood if we take into consideration that these networks are embedded in city space and thus the real theoretical maxima for both the overlapping degree and the edge overlap are not given by the number of layers but by either physical constraints or citizens' interest.
An interesting feature of multiplex networks is that the distribution of a quantity across the layers is, at least, as important as the overall value.For example, one node can have high overlapping degree either becau0se it has a low value in all the layers or because it has a high value in just a few layers.However, in our model this does not apply due to the singularities of public transportation networks that we discussed before.Nevertheless, to get insights about the importance of a transportation mode over the others we can switch perspective and consider the superlayers representation, see Fig. 2. As in recent works [5][6][7] , we propose to group together lines belonging to the same transport mode ending up in most of the cases with three superlayers 2/12 representing bus, tram and metro lines respectively.Except for this modification the other elements of the model remain untouched with interlayer links connecting nearby stops of different modes.
To start this new analysis let us denote by C b , C t and C m the subsets of layers corresponding to bus lines, tram lines and metro lines respectively.Now we can redefine the overlapping degree as i with x = {b,t, m}.Then, instead of considering the activity distribution across layers 16 , ), we can study the activity distribution across superlayers, B i = ∑ x (1 − δ 0,o x i ), which we represent versus the overlapping degree in Figure 3.At first one may think that nodes with the highest overlapping degree would also be those with the highest superlayer activity, but as we can see this is not the case.In fact, stops belonging to just one superlayer are the ones which tend to have the maximum overlapping degree.Those nodes despite being present in only one superlayer, surely have a major role for the mobility of the system.But, on the other hand, if we think in a disruption of the system it will be much easier to, for example, move temporarily a bus stop to a street nearby, even if it has a lot of lines, than cope with a disruption in a metro or tram stop.Thus, there is not a clear answer to the question of which node is the most important in these networks, as it depends on what we consider important.
To end this structural study we will focus on another measure used in multiplex networks analyses: interdependence 17 .Interdependence of node i, λ i , is defined as the sum over every other node j of the fraction of shortest paths between node i and j which go through two or more layers over the total number of shortest paths between them.Hence, if λ i is close to 0 it means that most of the paths go through just one layer while if it is close to 1 most of the paths go through 2 or more layers.The network interdependence is obtained as the average over all nodes.However, even though this measure is quite interesting as it provides some information which can not be obtained using the aggregated network alone, in our system it is not necessary because we already know that layers are interdependent.Indeed, if we look at table 1 we can see that most of the networks have a number of nodes of the order of 1000, but a single bus line usually has around 50 stops, 100 at most, and lines of tram and metro have even less.This means that to reach every node of the network we will surely have to cross multiple layers regardless of our starting position.One possible solution would be to work with superlayers instead of single layers, as they would be denser.But then we would face a similar problem as the bus superlayer is much bigger than the other two.
Therefore, we slightly modify this metric to take into account the specific nature of transportation networks.Let us denote by ψ i j (α) the number of shortest paths between nodes i and j which go across two or more layers being α one of them and by σ i j the total number of shortest paths between i and j.We can define the interdependency of layer α as: that gives us the fraction of all the shortest paths of the system which go through layer α.Although this can give us a lot of information if we want to study problems like congestion in a particular system, for this general analysis we can go one step further and define the interdependency of superlayer x as: that tells us how many shortest paths in the system have, at least, one link in superlayer x.Note that in this case, we normalize only over those shortest paths which use two or more layers because we are interested in figuring out if, given the need to change to another line, the tendency is to remain in the same transport mode or the system tends to be multimodal.
Results of this measure are shown in Figure 4a.Firstly, we note that almost all the shortest paths under consideration have at least one link in the bus superlayer, but this is quite logical given that the bus superlayer is much bigger than the others.Thus, most of the shortest paths will start or end in this superlayer.However, a closer look reveals an interesting result.Take the case of Madrid, for example.Even though its metro superlayer has only 16 layers and 241 nodes, while its bus superlayer has 177 layers and 4590 nodes, more than 70% of the shortest paths have at least one link in it.Similar results are found in the rest of networks as, for example, in Zaragoza where 20% of the shortest paths make use of the tram which has only 1 line and 50 nodes, with its bus superlayer being composed by 35 layers and 902 nodes.
As it can be seen, to fully understand these results it is necessary to take into account the size of the superlayers, the problem is how to define it.In this case, as we are exploring paths from node to node, we will consider the fraction of all the nodes in each superlayer as a measure of size.This way, if n x is the fraction of nodes in superlayer x, we will divide λ x by n x to obtain the desired result.This procedure has it drawbacks as it is not upper bounded, but on the other hand it allows us to extract information on the importance of each layer in an easier way.
From this modified measure (see Fig. 4b), we observe that the tram and the metro modes are of utmost importance for the mobility of these systems, as they are part of much more shortest paths than it would correspond judging by their size.Note that we have not taken into account that they may have, on average, higher speed or greater carrying capacity than buses, and hence, the previous results are obtained only considering a topological point of view.The reason seems clear: metro and tram are usually used to connect distant points with straighter routes than bus lines.Another interesting conclusion that is a consequence of the previous one is that, although most of the networks examined rely mainly on two transportation modes, urban transportation is quite multimodal.A good example of this is Madrid's network where bus nodes cover most of the city and metro nodes connect distant locations with straighter paths throughout it.However, only one of the three tram lines overlaps with bus nodes, the others seem to go to locations that are not covered by bus.Thus, the tram mode is not used as a way to reach certain locations faster but to just connect distant locations at the periphery of the city.

Case study: realistic modeling of an urban transportation system
In this section, we study a detailed model for urban transportation that includes not only the structure of the networks but also data about frequencies and traveling speeds.Our aim here is to realistically mimic a scenario that allows studying the efficiency of an urban transportation system and its response to malfunctioning or improvements.To do so, we also include the land layer, as discussed in Methods, allowing passengers to get to the stop which might be across the street even if there is not a direct link between different lines.In this way, we will be able to simulate paths starting/ending at any point of the city and not only at lines stops.
We take the transportation network of the city of Zaragoza (Saragossa) in Spain as our case study, since we have data regarding average speed and frequency for each line 18,19 .This allows us to consider time, so that now the shortest path will be the one which takes less time, instead of the geographical distance.To this end, we divide the weight of the links by the average speed on their layer.Thus, the weight of a link will now represent time.Moreover, all weights are fixed throughout the simulation, as we consider that the speed is always the same, except for links connecting land nodes and stops.In the latter case, the weight of these ones will be the time at which the next vehicle will arrive minus the current time: therefore, given a certain path, the sum of the weight of the links used that belong to the land layer, any other layer or to the inter-layer links set will give the total time spent walking, in a vehicle or waiting, respectively.
Even more, due to the recent construction, in 2011, of the tramway, surveys were carried out regarding the impact of this new transportation mode in the mobility of the city 20 .Thus, we are able to test whether our model can provide similar (qualitative) results to passengers' experience, namely: (i) mobility hinges on the tram, (ii) a lot of transfers are needed, (iii) if there is a disruption in the tram line the whole system is affected, (iv) bus stops are far from their destination and (v) bus frequencies are low.At the same time, although approximately, the previous effects can be quantified.
As we do not have data regarding passengers flux we do not take into account the carrying capacity of the vehicles and thus we consider a free flow regime as well as two different scenarios: movement from any point of the city to the city center (coordinates 41.652, -0.881) and vice versa and from any two points located at least 2 km away from each other, which has been reported as the minimum distance a passenger has to go to consider using public transportation 21 .
In Figure 5 we show an overview of the results obtained considering 1.000 individuals per minute with random origin and destination between 08:30 am and 10:00 am with a walking speed 5 km/h 22 .In Fig. 5(a) the distribution of the number of transfers is shown, note that only 35% of the individuals have reached their destination without transfers, which agrees with (ii).In Fig. 5(b) we represent the distributions of the waiting times, total time and distance covered by walking.With a walking speed of 5 km/h, 1 km is equivalent to 12 minutes and thus it represents approximately one third of the total time, which agrees with (iv) but not with (v).Now, suppose that an individual needs to do one transfer to get to his location.If one of the two frequencies is too low, it might be faster to walk to get to the second line instead of getting there using another line or equivalently walking from the transfer point to his destination instead of waiting for the second vehicle.He would see two problems, he needs to walk a lot and the frequencies are low, but as he has not been waiting it will not be reflected on our model.
To avoid this problem and following other studies on urban mobility 22 we introduced another parameter δ to module the walking speed v w such as v w = (1 − δ ) • 5 km/h.By tuning its value we can prevent individuals to avoid transfers but, at the same time, we still allow them to do so if the distance is small enough.As we can see in Fig. 5(a) with δ = 0.5 the number of transfers increases greatly, which means that large walking distances are not present anymore.If we look at time we see that, as expected, waiting time increases and the distance covered by walking reduces.The total time has been readjusted taking into account the decrease in walking speed so that if it varies is only because of the new transfer and, surprisingly, it almost does not change.This means that we have a duality in the system as we can choose between walking or waiting while keeping the total time constant, which indeed agrees with (iv) and (v).Note also that, although the waiting time distributions may not seem quite high if we compare them to the total time distributions, several studies have shown that waiting time perception is usually overestimated, specially if the real time has been quite low [23][24][25] .Finally, picture Fig. 5(c) shows the fraction of individuals which make use of each line normalized over the total number of trips.In both scenarios almost half of the trips use the tram line which completely agrees with (i).Even more, it also agrees with what we saw when we studied the structure of the network in the previous subsection.
The last item left to check is (iii).For simplicity, we will focus only on trips to the city center and vice versa (Figure 6).On the top left panel we show the average time it takes to get to the city center between 08:30 and 09:00 am.Now, our model allows us to easily test the behavior of the system during service disruptions by just removing the affected nodes or layers.In the top right panel of Fig. 6 we show the increase on the total time if we remove the tram line completely, result that agrees with (iii).Although the tram line is quite vulnerable as a disruption on a single node may cause the whole line to fail (as it follows a fixed path), it is important to note that this line has a couple of links which allows it to run in loops and thus, in this particular scenario, only the northern or the southern parts would be affected and not the whole line.Nevertheless, if we look at the bottom left panel of Fig. 6 we see that the situation is completely different if we remove the two most used bus lines to get to the city center (lines 22 and 35).As the bus mode is more redundant, the effect is much lower, at least under free flow conditions.Besides, a disruption on a single bus node does not cause so many problems on the whole line as it can be easily moved to a location nearby.Thus, we can say that the bus network is much more resilient while tram lines speed up the complete network.
To conclude, our model can also be used to easily test network improvements as the addition of new lines.On the bottom right panel of Fig. 6 we show the differences on the average time to get to the city center if we add a new tram line from east to west as it has been recently proposed by the city council 26 .As we can see the west part of the city would be the most benefited by this addition as the decrease in time is higher than in the rest of the city, even more taking into account that, as shown in the top left panel of Fig. 6, that part of the city was further away from the city center.Note that we have not removed any bus line and thus this result shows us again how tram and metro lines naturally speed up the network.

Discussion
In this paper, we have proposed a method to model public transportation systems as multiplex networks, which allows to either get more insights into their network properties or extract new conclusions of practical value.We have analyzed the structure of 9 urban transportation networks and found universal properties that can be related to the underlying structure of the cities.We have also shown that both a per line and a per transportation mode representations are useful and complementary to extract information about the functioning of transportation systems and to assess their vulnerabilities.Finally, using detailed data about service schedule and waiting times we created a realistic model for urban mobility.Despite its relative simplicity, we showed that our model not only reproduce real world facts, but that it can also be used to explore important issues like the impact of service disruptions and ways for network improvements, using information which, maybe with the exception of the average speed of each line, should be publicly available for most major cities.Needless to say, a deeper analysis would require to include the street network as the land layer and some information that might be harder to find -such as traffic light or mobility patterns-, but that is beyond the scope of this study.Concluding, the proposed model can be used to for a first diagnosis of the state of any urban transportation network using publicly available information and few computational resources.

Figure 2 .
Figure 2. Superlayer representation of the Madrid transportation system.The figure represents the three transportation modes considered: tram (yellow nodes, upper layer), metro (purple nodes, mid layer) and buses (white nodes, bottom layer).See Table1 for statistics of these layers.

Figure 6 .
Figure 6.Top left: average time it takes to get to the city center between 08:30 and 09:00.Top right: difference with the original case when the tram is removed.Bottom left: difference when the 2 most used lines to get to the city center are removed (lines 22 and 35).Bottom right: difference if we add a new tram line.This pictures have being done using tiles from OpenStreetMap 36 .

Table 1 .
Left: overlapping degree distribution of each network.Right: edge overlap distribution of each network.Despite these networks being quite different in composition and size they share some universal properties.Principal characteristics of the networks under study, ordered by decreasing population27-35 . 8/12