Multicommodity routing optimization for engineering networks

Optimizing passengers routes is crucial to design efficient transportation networks. Recent results show that optimal transport provides an efficient alternative to standard optimization methods. However, it is not yet clear if this formalism has empirical validity on engineering networks. We address this issue by considering different response functions—quantities determining the interaction between passengers—in the dynamics implementing the optimal transport formulation. Particularly, we couple passengers’ fluxes by taking their sum or the sum of their squares. The first choice naturally reflects edges occupancy in transportation networks, however the second guarantees convergence to an optimal configuration of flows. Both modeling choices are applied to the Paris metro. We measure the extent of traffic bottlenecks and infrastructure resilience to node removal, showing that the two settings are equivalent in the congested transport regime, but different in the branched one. In the latter, the two formulations differ on how fluxes are distributed, with one function favoring routes consolidation, thus potentially being prone to generate traffic overload. Additionally, we compare our method to Dijkstra’s algorithm to show its capacity to efficiently recover shortest-path-like graphs. Finally, we observe that optimal transport networks lie in the Pareto front drawn by the energy dissipated by passengers, and the cost to build the infrastructure.


Introduction
Finding optimal flow configurations in transport networks is an important problem in many real-world applications.While natural systems like river basins [1][2][3][4][5], leaf venations [6][7][8][9], or slime molds [10][11][12][13][14][15][16][17] involve transport of one type of mass only, e.g.water, this may not be the case in several engineering systems.For instance, routing data packets in communication networks, or passengers in urban transportation networks, requires multicommodity approaches where mass of different types interacts in a shared infrastructure, contributing to minimize one unique cost.
Despite their practical significance, multicommodity algorithms based on optimization routines are burdened by high computational complexity, caused by the simultaneous assignment of multiple commodities.Therefore, practitioners often rely on heuristics and approximations that lead to suboptimal solutions [18].Distributed approaches like message-passing algorithms have demonstrated encouraging results [19][20][21][22][23][24], but remain computationally costly in scenarios where there is a large number of origin-destination pairs to be routed, or when the network is not sparse.
A promising approach is that of optimal transport theory.Recent studies [25,26] have shown that this theoretical formalism can be adapted to address multicommodity scenarios, generalizing well-established results for uni-commodity models [27][28][29][30][31][32][33].The works of Lonardi et al. [25] and Bonifaci et al. [26] focus on a theoretical characterization of the problem, drawing a formal connection between optimal transport and an equivalent dynamical system that is formulated in terms of physical quantities like conductivities and fluxes.While preliminary results on multilayer transportation networks [34] suggest an empirical validity of this choice, questions remain open about its applicability in settings involving the transport of passengers.
In this work, we address this concern by studying the behavior of optimal transport approaches for multicommodity routing on urban transportation networks, with an empirical analysis on the Paris metro network.Our goal is to evaluate how different cost functions impact the distribution of passenger flows.In detail, we search for stationary solutions of a dynamics where edge capacities-conductivities-grow as an increasing function of the total amount of passengers traveling on the edges.We investigate numerically the cases where the dependence between conductivities and fluxes is either the sum of the passengers traveling on an edge (its 1-norm), or the sum of their squares (its 2-norm).The first choice is more intuitive, since counting the total number of users in a network is a natural metric to evaluate its occupancy.However, in the second case it is possible to prove that the companion gradient flow used in the numerical solver admits a unique stationary solution [25,26].
We design several experiments to investigate the main properties of optimal network configurations in the two cases.First, we observe that the 2-norm tends to dilute more substantially passengers on the network, avoiding heavily trafficked routes.Second, we compare our model with Dijkstra's algorithm [35], a popular approach for shortest-path minimization.We find that our method is a robust and efficient alternative to reproduce shortest-path-like networks.Furthermore, we test resilience to infrastructural failures, i.e., node and edge removal.Results show that the geographical locations of stations together with their degree, are decisive factors.Finally, we observe that optimal networks lie in the Pareto front drawn by two fundamental driving forces: the energy dissipated by passengers' flows and the network infrastructural cost.

Multicommodity routing on networks
We design a routing optimization problem on a network G(V, E), where V and E are the sets of nodes and edges, and each edge has length e > 0. The edges are given a conventional orientation stored in a signed incidence matrix, with elements B ve = {+1, −1, 0} if v is the head, the tail, or neither of them for edge e, respectively.We model transportation of M ≥ 1 commodities through the network, each identified by an index i.We use them to differentiate passengers entering the network from different stations (i ∈ V ), so that multiple users sharing the same path catalyze traffic congestion.Suppose that a commodity i has a mass rate S i v flowing into node v and outflows S i u ∀u = v, with ∑ v S i v = 0, ∀i ∈ V , to ensure that the system is isolated.The main quantities of interest are the edge conductivities µ e ≥ 0, which can be thought of as capacities.These regulate how passengers flow on the network, as higher conductivity is allocated to edges that are more utilized, while low-conductivity edges are those with fewer passengers.Hence, determining the values of µ e , ∀e ∈ E, implies determining the flows of passengers, and therefore of traffic on the network.The distribution of conductivities is regulated by the following dynamics and main equations of our model: Here L is the weighted Laplacian matrix of the network, with entries L vu := ∑ e (µ e / e )B ue B ve ; p i v are pressure potentials generated by a commodity i on the nodes; f is a non-negative function of the fluxes F e , M-dimensional vectors with entries F i e := µ e (p i u − p i v )/ e , for e = (u, v).A visualization of the main model's variables is shown in Fig. 1.Equation ( 1) is Kirchhoff's law, expressing conservation of mass; Eq. ( 2) regulates the time evolution of conductivities by means of a feedback mechanism where the higher the flux on an edge, the larger its conductivity µ e .All commodities share one unique infrastructure, so we follow [25] and assume that µ e is the same for all i.This particular modeling choice corresponds to not prioritize any commodity in particular, i.e. having all users sharing the metro infrastructure without any hierarchy.However, one could consider imposing a set of rules for traffic regulation by explicitly accounting for different µ i e terms.The growth of µ e is governed by the function f , that is typically an increasing and differentiable function of some norm of the fluxes [25,26].The aim of our work is to investigate how different expressions of f result in different distributions of passengers flows, thus we focus on the following two choices: (i) f (x) = ||x|| 2 2 (2-norm), and (ii) f (x) = ||x|| 2 1 (1-norm).The first captures intuition in contexts as plant biology, where nutrients travel independently in conduits which are held together in fibers, contributing to growth of branches.However, it may not be the most appropriate one in applications involving transport of passengers, as the 2-norm does not have a straightforward interpretation.On the contrary, the latter is arguably a more natural choice, backed up by the intuition that edge capacities are controlled by the number of passengers traveling on them (instead of the sum of squares).Both norms are taken squared, this is motivated by an analogy between our dynamics and Joule's law in electrical circuits, that we discuss in Section 2.2.We remark that other possible choices of f can be used, e.g. the complete spectrum of p-norms, or a tunable sigmoid profile as in [13], these can be interesting subjects for future work.The effect of f is balanced by a negative linear term in the conductivities, determining their exponential decay in time if no mass is moving through an edge.Note that our dynamics is highly non-linear in µ e , since least-square solutions of Kirchhoff's law are of the form p i v = ∑ u L † vu S i u , with † denoting the Moore-Penrose inverse.Finally, the role of the free parameter 0 < β < 2 is to capture different transportation mechanisms: β > 1 consolidates passengers on fewer edges, following a principle of economy of scale; β < 1 enforces passengers to distribute more broadly along the network; β = 1 is shortest-path like.In brown we draw the edge capacities, green is used to highlight the length of one edge.From the central node a positive inflow of two commodities enters (orange and light blue), these move towards their destinations-the colored minuses-sharing an edge.Thus, multiple colored fluxes generate traffic congestion.In pink we represent the pressure potentials of a third commodity.Differences of pressure along an edge trigger F 3 .

Connection with optimal transport
The dynamics introduced in Section 2.1 has a strong connection with optimal transport theory.In fact, in [25] it is shown that stationary solutions of Eqs. ( 1) and ( 2) are also stationary points of the minimization problem: min for a fixed constant K > 0 and where J is the dissipation cost.This is also equivalent to minimizing J Γ := ∑ e e f (F e ) Γ , with Γ = (2 − β )/(3 − β ), generalizing Banavar et al. [36].
The crucial distinction between the 1-norm and 2-norm dynamics is that the latter admits the Lyapunov function which enables to prove that asymptotics of the dynamics minimize J [25] for β ≤ 1.We remark that for β ≤ 1 the Lyapunov admits a unique minimum although possibly multiple minimizers, while for β > 1 the functional has several local minima.Noticeably, the first sum in Eq. ( 6) is equivalent to J = (1/2) ∑ e e ||F e || 2 2 /µ e (see Methods 4.1).The second term in Eq. ( 6) is W := (∑ e e µ γ e )/2γ with γ = 2 − β , interpretable as the cost to build the network.With this in mind, the Lyapunov functional becomes the sum of dissipation and infrastructural costs.
As mentioned before theoretical guarantees cannot be recovered for the 1-norm dynamics, where a Lyapunov functional is not straightforward to derive.While solving the dynamics may still result in meaningful flows, we cannot guarantee that these solutions minimize the cost J Γ , i.e. to have optimal transport.However, we find empirically that on the metro network of Paris-our case of study-J Γ decreases along solution trajectories of the dynamics, with stationary solutions lying in a basin of the cost.This empirical result is valid here, but this may not be true for other configurations of the network or initial conditions of the dynamics, hence practitioners should first validate their model (see Methods 4.2 and 4.3 for a more detailed explanation; a listing of the variables introduced in Sections 2.1 and 2.2 can be found enclosed as Supplementary Information).

Results on the Paris metro network
In this work, we investigate the applicability of the dynamics in Eqs. ( 1) and (2) on the Paris metro.Topology data are taken from [37], the network is preprocessed to have a total of |V | = 302 nodes and |E| = 359 edges, coherently with the observed metro of Paris (Methods 4.2).As anticipated, we define commodities as stations where passengers enter.This means that each vector S i has only one positive element in v = i (where the passengers of type i enter), while the remaining elements of S i contain the outflows of passengers who travel from v. Other choices can also be made based on the application, but this will not impact the validity of the model.Lastly, we introduce the parameter 0 ≤ ρ ≤ 1.This averages the passenger inflows as ), with • average over the nodes.When ρ tends to 1 passengers distribute uniformly on the network, while ρ approaching 0 means passengers enter and exit station more heterogeneously, see Methods 4.3.We test the two response functions f .Optimal fluxes resulting in the two cases can be seen in Fig. 2a, where the thickness of each edge is proportional to the fraction of passengers traveling through it.As expected, for β < 1 optimal transport networks are loopy, with many densely connected edges having fairly uniform fluxes.On the contrary, for β > 1 optimal topologies are more tree-like, with few central arteries where traffic is highly concentrated.This applies to both cases.
We notice two distinct behaviors, depending on β .For β < 1 (β = 0.1 in Fig. 2a), solutions cannot be distinguished.This is explained by the Lyapunov functionals L β being strictly convex in this case, with stationary conductivities that are their only minimum.This observation suggests that in the congested transportation regime (0 < β < 1), where one aims at minimizing traffic congestions, using the 2-norm is equivalent to the more intuitive 1-norm formulation.This is not the case for β > 1, where the two dynamics favor different local minima.These correspond to optimal networks with distinct central arteries where passengers are directed.The differences are further accentuated in Fig. 2b, where edges are colored with flux differences in these two cases, and where we highlight with markers instances of highly traversed stations.In detail, we can see that two routes branch from Charles de Gaulle-Étoile, the upper one passing by Place de Clichy is favored by the 1-norm, and the lower one reaching Saint-Lazare is preferred by the 2-norm.As for the connection between Châtelet and Gare de Lyon, we observe that the 1-norm tends to favor the shortest path between the two stations, with most passengers travelling in a straight line.On the contrary, the path selected by the 2-norm has a deflection.
Stimulated by these qualitative differences, we investigate different metrics for an in-depth quantitative evaluation for the case β = 1.5.First, analyzing the sorted distributions of the fluxes ||F e || 1 in Fig. 2c, we notice that the 1-norm dynamics has a more pronounced fat-tailed distribution with a sharper and higher peak.This means that the 1-norm tends to concentrate fluxes on fewer edges.Such effect becomes starker for more homogeneous distributions of passengers entering the stations, i.e. setting ρ = 0.5, 1.0 (see Supplementary Figs. 1 and 2).
We quantify this using the Gini coefficient [38]: for a quantity x, with x = ∑ e x e /|E| being its mean, and m, n denoting edges.In our analysis we set x e = ||F e || 1 .Results are shown in Fig. 3a, where the Gini coefficient is plotted against β for different values of ρ.
As expected, the Gini coefficient increases with β , as users' paths are more concentrated along fewer edges.The values for the two dynamics are similar for β < 1, for the reasons mentioned above.Instead, for β > 1, markers progressively separate as β increases.The 2-norm has always smaller values than their counterparts, further demonstrating the tendency of the 2-norm to dilute fluxes on a larger area of the network.
We study the behavior of the fraction of idle edges, i.e. the number of edges with negligible fluxes, divided by the total number of edges |E|, see Fig. 3b.This quantity manifests a sudden phase transition at β = 1, where the dynamics switches from an homogeneous distribution of passengers on the entire network infrastructure, to a distribution progressively more concentrated on a smaller fraction of edges, as β increases.Finally, the 2-norm dynamics returns fewer idle edges than the 1-norm, as paths are less concentrated.Notably, such abrupt phase transitions are typical of capacitated models on networks [36], and emerging in routing strategies involving a critical exponent regulating efficient transportation [39].
To summarize, we observe two main findings.First, we noticed that in the regime of β < 1 the 1-norm and the 2-norm produce identical optimal networks.This result does not hold for β > 1, where many local minima of L β generate different optimal paths.Second, analyzing the fraction of idle edges, the Gini coefficient of the fluxes, and their distribution, we found that in the regime of branched transportation (1 < β < 2), the 2-norm tends to limit more traffic congestion, as paths are less consolidated into fewer edges compared to the 1-norm.

Comparison with Dijkstra algorithm
As discussed, the main property connecting our 2-norm dynamics with optimal transport is that its stationary solutions are minimizers of the cost J Γ = ∑ e e ||F e || 2Γ  2 , with Γ = (2 − β )/(3 − β ) [25].This cost, for β = 1 and M = 1 is equivalent to that of [15,16] and has optimal fluxes taking the shortest path from their source to their sink.A theoretical generalization of this result to the multicommodity setup is not trivial.In fact, for the 2-norm case the cost reads J Γ = ∑ e e ∑ i (F i e ) 2 , that is not linear in the commodities, i.e. searching for its minimizer does not correspond to solving M uni-commodity problems, one for each i, and then overlapping them.As for the 1-norm, the dissipation cost with β = 1 is J Γ = ∑ e ∑ i e |F i e |, and therefore its unique global minimum corresponds to that obtained overlapping M shortest paths.
We can numerically compare our methods with a shortest path routine using Dijkstra's algorithm [35].Precisely, we iterate over the commodities and assign a flux F i e equal to the fraction of passengers moving from the source v to the sink u, to each edge belonging the shortest path between v and u-the latter computed with Dijkstra's algorithm.
We compare the optimal transport networks obtained using our methods with β = 1 (Fig. 4a) with the networks returned by Dijkstra's algorithm (Fig. 4b).The three graphs are visibly similar but not identical.Particularly, we focus on the four highlighted areas in Fig. 4b, containing the main branches departing from the central area of the city of Paris.We see that the more trafficked routes in the pink South-West region are identical for our methods and for Dijkstra's one.Traffic in the Nort-West blue region seems to be more diluted for our methods, with the 2-norm optimal network being slightly more similar to Dijkstra's.As for the North green region, both our algorithms concentrate traffic in a curved branch covering a large portion of the Northside of the city.This route is not prioritized in Fig. 4b, as traffic in the green portion is more distributed.Finally, in the South-East yellow area, there is only one main route branching from the city center, while its shape is straight for Dijkstra's, our methods favor a slight deflection.
We attribute these differences in the optimal topologies to the high complexity of the energy landscape of J Γ .In fact, while Dijkstra's algorithm computes and overlaps each source-sink shortest path separately, our methods treat all the commodities at once.This may lead to convergence in suboptimal points, in particular around β = 1, where the cost transitions from being strictly convex (β ≤ 1) to strongly non-convex (β > 1).While our method in this case may not always reach an optimal solution, it has the practical advantage of having a worst-case complexity of O(M|V | 2 ) [25].In principle, this can be further reduced using a backward Euler scheme combined with the inexact Newton-Raphson method for the discretization of Eq. ( 2), and using a Multigrid solver for the solution of Eq. (1) in O(M|V |) steps [40].By contrast, Dijkstra's has a worst-case complexity of O(|E| + |V | log |V |) [41], with the algorithms that needs to be executed M 2 (in our application to the Paris metro there are M 2 = |V | 2 source-destination pairs) times to solve a multicommodity problem.Lastly, we test the deviation of the cost of our methods from Dijkstra's one.In Fig. 4c we plot the relative cost difference taken in absolute value, that is ∆J := |J Γ − J Dijkstra |/J Dijkstra , with Dijkstra's network cost calculated as J Dijkstra = ∑ e e ||F e || 1 .This has a sharp drop at β = 1, where traffic is not favored nor penalized, with the cost of our network that is similar to the one of the shortest path returned by Dijkstra's algorithm.For β < 1 we have J Γ > J Dijkstra , showing that penalizing traffic congestion has the drawback of producing more expensive infrastructures.We observe the opposite behavior for β > 1, where J Γ < J Dijkstra , with congested networks that are progressively cheaper as β increases.

Network robustness to failures
We now showcase a possible relevant application of our model by analyzing network's robustness to structural failures as nodes removal.Network managers interested in finding which stations are crucial for alleviating potential traffic overload can look at the congested transportation regime (we set β = 0.1 to favor homogeneous fluxes) and investigate how fluxes resulting from our model distribute along the network.
In detail, we remove sequentially a total of four stations from the network: Châtelet, Gare du Nord, Saint-Lazare, and Gare de Lyon.The last three are those with the largest number of inflowing passengers, while Châtelet has a central position and a high node degree d = 8.Once each station was removed, its passengers were redirected to its neighboring nodes, and then solutions of the dynamics were found with this setting, as depicted in Fig. 5.
In Fig. 5a we display the 1 + 4 networks obtained removing none, and the stations indicated in Fig. 5b.In Fig. 5c we plot the Gini coefficients of the optimal transport networks against β .We notice that for β > 1 all the points collapse together, regardless of the number of failures.This scenario, however, is of little interest for the situation we want to address, being flux aggregation already favored by β > 1.As for β < 1, the difference in Gini coefficient gets wider the lower the β , with the largest gap at β = 0.1, we thus investigate this case.
Removing Châtelet from the network causes a considerable jump in the Gini coefficient, thus increasing the possibility of traffic jams.In fact, as we see from the second plot in Fig. 5a, all the passengers who were traveling on the South-West route branching from the city center are redirected in a way that congests southern arteries of the network.Removing Gare du Nord is not as crucial for traffic rerouting.Indeed, the main difference between the second and the third network of Fig. 5a is that passengers who were departing from Gare du Nord move to its southern neighboring station, Gare de l'Est, and modify only slightly their path.A large jump in the Gini coefficient is visible after removing Saint-Lazare, which seems to be fundamental for connecting the central area of Paris to its north side.In the fourth plot in Fig. 5a, we can see that traffic becomes highly congested in the northern branch directed from east to west.Gare de Lyon causes a negligible change in the Gini coefficient, associated to a modest traffic rerouting in the South-East part of the network.

Pareto front
To conclude our analysis of the multicommodity routing problem it is possible to verify that stationary solutions of Eqs. ( 1) and ( 2) lie in the Pareto front (Fig. 6), which can be expressed in closed form as: We plot the optimal transport networks after nodes trimming.Edge widths and color are normalized fluxes, the size of each node is proportional to the number passengers entering in it.All quantities are averaged over 100 runs of the dynamics with random initialization of the conductivities, µ e ∼ U(0, 1).Stars highlight positions of removed stations, following the same scheme of Fig. 5b.(b) Network showing which stations have been removed, these correspond to the fully colored nodes, with colors chosen according to the legend on the left.Colored edges, and nodes with colored borders are those where the passengers get redirected.The colored borders are proportional to the passengers' growth.(c) Gini coefficients vs. β , errorbars are standard deviations.Points are colored following the scheme used in the rest of the panel.Similar results for the 2-norm dynamics are in Supplementary Fig. 3.
The emergence of a Pareto front between J and W is not limited to engineering networks like the ones studied here.A similar trade-off has been observed in the widened pipe model for plants of Koçillari [42], where minimization of hydraulic resistance and of carbon cost compete for natural selection.
Moreover, looking at the inset of Fig. 6 and Fig. 3 we can observe that the Gini coefficient and the fraction of idle edges can be interpreted as driving forces responsible for the design of the optimal transport network, counterbalancing its cost.In fact, congested transport networks obtained for low values of beta β have a high cost, but are more resilient to damage-low Gini and no idle edges-being their infrastructure densely connected.On the contrary, setting β large has the effect of producing sparse networks.These infrastructures have the benefit of being cheaper, but they are less resistant to node and edge failures, as mentioned in Section 2.5.We plot the dissipation/infrastructure cost ratio vs. β .Different points are averaged over 100 runs with random initialization of the conductivities, µ e (0) ∼ U(0, 1).Inset: infrastructure cost, W , vs. dissipation cost, J. Marker shapes are identical to those of the main plot, colors follow the colorbar over β .

7/3 3 Conclusions
Multicommodity routing is a powerful tool to model optimal network configurations in transportation systems [18].In this work, we developed a robust and efficient model able to perform this task by finding stationary solutions of a dynamical system controlling fluxes and conductivities of edges.Our dynamics extends previous works focusing solely on the unicommodity [15,27,28,30,43], and on the multicommodity setup [25,26,34].Precisely, we propose two different response functions regulating the growth of conductivities, whose evolution is dictated by the passengers moving in the metro.We performed a thorough empirical study of the optimal transport networks resulting in the two cases.Using metrics like the fraction of idle edges and the Gini coefficient of the fluxes, we found that the two behave similarly in the congested transportation regime, but differently in the branched transportation one.In this case, the 1-norm dynamics produces flows that are more concentrated on fewer edges, potentially leading to traffic overload.We addressed the capability of our method to recover shortest path networks by comparing it with Dijkstra's routine.Such comparison showed that our approach is a viable computational alternative to perform this task, achieving accurate results and being, in principle, scalable for large networks.Additionally, we performed an experiment to measure network robustness to infrastructural failures, revealing that the stations of Châtelet and Saint-Lazare are crucial to ease congestion of metro routes.Finally, we showed that solutions of our model lie in the Pareto front drawn by the energy dissipated during transport and the network infrastructural cost.
Altogether, our findings extend the current research in multicommodity routing problems using optimal transport principles and help to understand the mechanism underlying passenger flows in transportation systems.
Our formalism can be further extended to other possible applications related to the flow of passengers in transport networks.An example could be to incorporate time dependences in the passengers' inflows, modeling scenarios where stations are subject to different loads during a day, thus generalizing [44].One could also compare the extent of traffic jams in multicommodity settings to that of other routing strategies for urban transport displaying notable phase transitions and scaling laws [39,45,46].
We would like to remark that our approach is applicable to a variety of practical problems unrelated to transportation systems.A practitioner may then consider response functions for the dynamics alternative to those studied in this work.The analysis performed in this work show how such a problem can be addressed and paves the way for further research beyond urban transportation networks.

Lyapunov and dissipation cost equivalence
Here we show that the first term of the Lyapunov functional in Eq. ( 6) is identical to the 2-norm dissipation cost J = (1/2) ∑ e e ||F e || 2  2 /µ e , we follow [25].Multiplying both sides of Eq. ( 1) for p i v and summing over i and v yields ∑ i,v,u,e (µ e / e )B ue B ve p i u p where we made explicit the network Laplacian entries L vu := ∑ e (µ e / e )B ue B ve , and we used the definition of the fluxes F i e := µ e (p i u − p i v )/ e , for e = (u, v), and ∀i.Equation ( 10) is the identity we wanted to prove.

Preprocessing
The original dataset in [37] is provided as a multilayer network embedded with different transportation types, thus we performed a preprocessing to extract the metro network.First, we trimmed nodes belonging to other layers and then merged redundant stations having the same name by collapsing them together.This redundancy was due to the presence of stations with two entrances located in slightly different geographical positions; their coordinates displacement was always negligible compared to the physical extension of the whole network.The trimmed graph reflects consistently the real topology of the Paris metro.For convenience, the longitude and the latitude of nodes are rescaled within the range [0, 1].We did not have access to the exact travel routes data, so we assigned the entries of S based on the "importance" of each station.In fact the number of users validating their tickets when entering a station, the only data at disposal, is easier to track than the number of exiting users together with their entrance station.In practice, we assigned N − 1 positive "influence factors" to each station i, one for each node u = i where the users entering in i can potentially exit: r i u = g u / ∑ w =i g w , instead r i v=i = 0, where g v is the amount of users entering the metro from v = i.Note that 0 ≤ r i v ≤ 1 for all v nodes, and ∑ v =i r i v = 1.Thus, we can estimate the number of people exiting from a station u = i by assigning S i u = −r i u g v=i , while S i v=i = g v .The intuition is that a station with a high entering volume of passengers, i.e. high g v , should have a large amount of exiting users, thus its "influence" value r should be high.1) and (2).Results are displayed for different combinations of ρ and β , and are averaged over 100 runs with random initializations of µ e (0) ∼ U(0, 1).The curves are normalized in (0, 1).Shaded areas denote standard deviations, these are thicker for β > 1 since the cost is concave, with a rich landscape a local minima.

Validation
We validate Eqs. ( 1) and ( 2) with the 1-norm for several combinations of β and of the input loads S(ρ), with 0 ≤ ρ ≤ 1 progressively smoothing the passengers inflows data collected in [47].In detail, users entering stations are regulated as g v (ρ) = g v − ρ(g v − g ) where g v are inflows in v as in [47], and • averages over the nodes.Using this procedure, we build S(ρ) following the "influence assignment" described in Methods 4.2.Thus, S(ρ = 0) = S corresponds to the originally extrapolated mass matrix, while S i v=i (ρ = 1) = g and S i u (ρ = 1) = − g /(|V | − 1) for all u = v = i.Meaning that for ρ = 1 passengers move with uniform rates from each-and to-all stations.In Fig. 7 we plot the time-evolution of J Γ , which decreases over time.

Pareto front derivation
To obtain the Pareto form in closed form as in Eq. ( 8) it is sufficient to exploit the scaling µ e ∼ (F e ) δ , δ = 3 − β , valid for stationary solutions of the multicommodity dynamics [25].In particular, it is immediate to recover Eq. ( 8) by rewriting J in Eq. (3) as a function of the conductivities µ e .

Figure 1 .
Figure1.Schematic problem visualization.In brown we draw the edge capacities, green is used to highlight the length of one edge.From the central node a positive inflow of two commodities enters (orange and light blue), these move towards their destinations-the colored minuses-sharing an edge.Thus, multiple colored fluxes generate traffic congestion.In pink we represent the pressure potentials of a third commodity.Differences of pressure along an edge trigger F 3 .

Figure 2 . 5 Figure 3 .
Figure 2. Optimal transport networks panel.(a) Optimal transport networks with β = 0.1, 1.0, 1.5 for the 1-norm and 2-norm.Edge thickness and color are proportional to ||F e || 1 , normalized to sum to 1; node sizes are proportional to the number of passengers entering them.All the quantities are averaged over 100 runs of the dynamics with µ e (0) ∼ U(0, 1).(b) Network colored using the difference of the fluxes obtained with the 1-norm and with the 2-norm.Results are displayed for β = 1.5, and using the data of Fig. 2a.Widths of edges are proportional to the absolute value of the flux difference, so that by matching the color and size information it is possible to distinguish differences in networks generated by the two response functions.Marked stations are those discussed in Section 2.3.(c) Sorted flux distribution over the edges for β = 1.5.All quantities have been computed with ρ = 0.0, i.e.S(ρ = 0.0) = S (see Methods 4.2 and 4.3).Similar panels for ρ = 0.5 and ρ = 1.0 can be found in Supplementary Figs. 1 and 2.

Figure 4 .
Figure 4. Comparison between our methods and Dijkstra's algorithm.Lenghts of edges are ˜ e = e /n e , with n e number of vehicles passing through e as in[37].This rescaling has been performed following the intuition that metro users moving along the network may travel using the fastest route (that for paths of the same length is the one with more frequent trains) to reach their destination; this may not correspond to the geographically shortest one.(a) Optimal transport networks for our methods.Quantities are computed over 100 runs of the dynamics with random initialization of the conductivities, µ e ∼ U(0, 1).(b) Optimal transport network computed with Dijkstra's algorithm.For all three networks, edge widths and colors are ||F e || 1 , and the size of each node is proportional to the number of passengers entering in it.(c) Relative energy difference between our methods and Dijkstra's, taken in absolute value.Errorbars are standard deviations over 100 realizations.

Figure 5 .
Figure 5. Traffic rerouting after network structural failures.(a) We plot the optimal transport networks after nodes trimming.Edge widths and color are normalized fluxes, the size of each node is proportional to the number passengers entering in it.All quantities are averaged over 100 runs of the dynamics with random initialization of the conductivities, µ e ∼ U(0, 1).Stars highlight positions of removed stations, following the same scheme of Fig. 5b.(b) Network showing which stations have been removed, these correspond to the fully colored nodes, with colors chosen according to the legend on the left.Colored edges, and nodes with colored borders are those where the passengers get redirected.The colored borders are proportional to the passengers' growth.(c) Gini coefficients vs. β , errorbars are standard deviations.Points are colored following the scheme used in the rest of the panel.Similar results for the 2-norm dynamics are in Supplementary Fig. 3.

Figure 6 .
Figure 6.Pareto front.We plot the dissipation/infrastructure cost ratio vs. β .Different points are averaged over 100 runs with random initialization of the conductivities, µ e (0) ∼ U(0, 1).Inset: infrastructure cost, W , vs. dissipation cost, J. Marker shapes are identical to those of the main plot, colors follow the colorbar over β .

Figure 7 .
Figure 7. Validation of the dynamics with the 1-norm.We show the dissipation cost evolution along solution trajectories of Eqs.(1) and(2).Results are displayed for different combinations of ρ and β , and are averaged over 100 runs with random initializations of µ e (0) ∼ U(0, 1).The curves are normalized in (0, 1).Shaded areas denote standard deviations, these are thicker for β > 1 since the cost is concave, with a rich landscape a local minima.

Figure S1 .
Figure S1.Optimal transport networks panel with forcing S(ρ = 0.5).For a detailed description of the subplots one can refer to Fig. 2 in the main text.

Figure S2 .Figure S3 .
Figure S2.Optimal transport networks panel with forcing S(ρ = 1.0).For a detailed description of the subplots one can refer to Fig. 2 in the main text.