Symmetry breaking in optimal transport networks

Engineering multilayer networks that efficiently connect sets of points in space is a crucial task in all practical applications that concern the transport of people or the delivery of goods. Unfortunately, our current theoretical understanding of the shape of such optimal transport networks is quite limited. Not much is known about how the topology of the optimal network changes as a function of its size, the relative efficiency of its layers, and the cost of switching between layers. Here, we show that optimal networks undergo sharp transitions from symmetric to asymmetric shapes, indicating that it is sometimes better to avoid serving a whole area to save on switching costs. Also, we analyze the real transportation networks of the cities of Atlanta, Boston, and Toronto using our theoretical framework and find that they are farther away from their optimal shapes as traffic congestion increases.


I. INTRODUCTION
Networks that provide optimal transport properties [1,2] are of interest in many different disciplines ranging from the study of natural systems such as water transport in plants [3], veination patterns in leaves [4,5], river basins [6] to the design of transportation infrastructures, either from an applied point of view [7], or from a more mathematical perspective [8].In particular, transportation networks evolve in time and their structure has been studied in many contexts from street networks to railways and subways [9][10][11][12][13][14][15][16][17].The evolution of transportation networks is also relevant to biological cases such as the growth of slime mould [18] or social insects [19,20].
An important problem consists in designing a network from scratch or extending an existing network; this is a central subject in transportation and location science, usually known as the network design problem [21].Such a problem, applied to rapid transit networks, for example, is divided into three sub-problems, which are solved numerically: location of new stations, construction of the core network connecting these stations, and location of intermediate stations on the network.From an engineering point of view, this type of problem can be solved with various optimization methods on practical cases, but the general behavior of optimal solutions is not known.From a purely mathematical point of view, there have been extensive studies of optimal networks over a given set of nodes (such as the minimum spanning tree [22], or other optimal trees [23]).Some of these problems allow for extra chosen nodes such as the Steiner tree problem [24], or geometric location problems in which demand points are to be matched with supply points [25].Another example is the much-studied Monge-Kantorovich mass transportation problem [26], involving matching points from one distribution with points from another distribution.
The main problem in network design is fundamentally different.We are given the density of population and we are looking for the network that minimizes some objective function involving some average time, in general (although other choices are possible, see for example [7]).In this setting, there are usually two different transport modes, a slow one representing for example cars on the road network, and a fast one representing the subway or some rapid transit network.The natural framework here is then the one of multiplex networks comprising two different transportation networks, one known while the structure of the second one is to be determined (for multiplexes in the context of optimization see for example [27]).A practical realization of this problem concerns the specific case of subways (for a network analysis of subways, see for example [10,[28][29][30][31][32][33][34][35][36]).In most large cities, a subway system has been built and later enlarged, with current total lengths varying from a few kilometers to a few hundred kilometers.The geometry of these networks, as its total length increases, varies from simple lines to more complex shapes with loops for larger networks [37,38].In particular, for the largest networks, convergence to a structure with a well-connected central core and branches reaching out to suburbs has been observed [10].
Algorithmic aspects of network design have been studied within computational geometry (e.g., [39] chapter 9) and location science (e.g., [7] and references therein), and arXiv:2311.05059v1[physics.soc-ph]8 Nov 2023 some simpler problems of this type have been addressed previously.For instance, the problem of the quickest access between an area and a given point was discussed in [40,41].But our specific question -optimal network topologies as a function of population distribution and network length -is largely an open problem.In [38], some results were obtained in two-dimensional systems by comparing a priori defined optimal network configurations.First, it was shown that, if the goal is reaching a single point in the plane, then the optimal network is necessarily a tree.Second, the paper hinted at the possibility of the existence of transitions between optimal configurations when the length of the network changes.More precisely, it has been shown that as the length of the network increases resources go preferentially to radial branches and that there is a sharp transition at a critical value of the length where a loop appears.
In this paper, we address the problem of the quickest average time to access a central point using a multiplex framework (see Figure 1).We are given the structure of one layer, and we are allowed to build an additional layer that can facilitate quicker access to the central point.The new layer is characterized by a faster speed than the existing, slow layer; however, changes of layers incur a cost.We study the optimization problem of finding the best configuration of the fast layer on systems of arbitrary dimensions.We solve exactly the optimization problem for one-dimensional systems, showing that the optimal fast-layer configuration undergoes a sharp transition between a perfectly symmetric configuration and a fully asymmetric configuration.We numerically show that such symmetry breaking in optimal networks occurs in systems of arbitrary dimensions.We specifically focus on two-dimensional lattices and perform a systematic study of transportation systems within real cities, where we use the slow layer to model the road network and the fast layer to model the subway network.We find that real subways display network topologies compatible with the optimal ones that can be obtained using our computational framework.Differences between real and optimal networks typically arise as the ratio between subway and car speeds increases.

II. MULTIPLEX TRANSPORTATION MODEL
We consider a well-established network model for multimodal transportation systems (see Methods and SM for details) [12,42].The model comprises two network layers, namely a slow and a fast layer, each denoting a different mode of transportation, e.g., cars and subways.Each node n in the slow layer has a mirror image, or replica, in the fast layer F (n).For example, we can think of node n in the slow layer as an intersection between roads, and of F (n) in the fast layer as the subway station corresponding to that intersection.The system is such that edges in the fast layer are a subset of the replica edges of the slow layer; essentially, not all roads are mirrored by subway FIG. 1. Illustration of the multiplex transportation model.In the slow layer (blue), the time required to traverse an edge equals one; in the fast layer (orange), each edge is traversed in a time reduced by a factor 0 ≤ η ≤ 1. Replica nodes across layers are connected by edges whose transit time is c ≥ 0 denoted by the dashed segments.Two possible paths connecting the white node to the center node given in red are shown.The path shown by grey arrows requires a total time equal to 2 as it uses only two edges in the slow layer.The second path, highlighted by black arrows, involves two changes of layer, one edge in the fast layer, and one edge in the slow layer, resulting in a total transit time equal to 1 + η + 2c.The second path is faster than the first one as long as 1 + η + 2c < 2.
segments.We assume that edges in the two layers are traversed at different speeds, and without loss of generality, we parameterize the speed ratio with 0 ≤ η ≤ 1. Agents departing from nodes in the slow layer move along their quickest path towards a specific node o in the slow layer, the so-called center of the network.These quickest paths can involve edges in both layers; however, each change of layer, happening between replica nodes, has a cost equal to c ≥ 0. See Figure 1 for a schematic example.For a given configuration F of the fast layer, we can find the minimum-cost path of each node n in the slow layer to the center o.We then measure the efficiency of F in terms of the average time to reach the center, i.e., τ (F) as defined in Eq. (3).The metric accounts for the fact that each node n of the slow layer has associated a weight p n , representing the demand of node n.
The goal of our modeling framework is finding the best or optimal configuration F * of the fast layer, i.e., the configuration that corresponds to the minimum value of the average time to reach the center starting from the nodes of the slow layer.The optimization problem defined in Eq. ( 4) is constrained by the number of edges L can be used to form the fast layer.Note that L is interpreted as the cost of building the fast layer, hence is measured in the same units as τ and c.We are interested in providing a full characterization of the topology of the optimal fast layer as a function of the parameters η and c of the multiplex transportation model.We study this optimiza-tion problem under different settings determined by the topology of the slow layer.

A. One-dimensional systems
We begin our investigation by studying a onedimensional version of our model (see Methods for details).For simplicity, the model is thought in continuous space.However, the calculations and results can be immediately generalized to a one-dimensional discrete lattice.The slow layer consists of a segment of length 2R extending symmetrically around the origin or center o.In the computation of the continuous version of the objective function of Eq. ( 3), we further assume that all parts of the slow layer have equal weight.An illustration of the system is shown in Figure 2(a).
We remind that the problem is finding the optimal configuration of the fast layer such that the average cost to reach the center from any point of the slow layer is minimum (see Eqs. ( 3) and ( 4)).The optimization problem is constrained by the fact that the fast layer has a fixed cost L, with L ≤ R. We mathematically prove that the topology of the optimal fast layer undergoes a series of phase transitions depending on the values of the model parameters L, η, and c.
A first, trivial critical point is given by r c = 2c/(1 − η) (see also Eq. ( 2) in Methods): there is no advantage in having a fast layer with length L ≤ r c , as a fast layer with such a cost is not used in any minimum-cost paths to the center.The optimization problem is then subject to the constraint that the fast layer should be of length r c ≤ L ≤ R.
As we prove in the SM, solutions to this optimization problem are given by connected segments that include the replica of the center F (o).We can then parameterize the optimal fast layer by a single quantity 0 ≤ α ≤ 1/2, such that the fast layer extends over a length αL to the right of F (o) and over (1−α)L to the left of F (o).We find that only two configurations for the optimal fast layer are possible: (i) a completely asymmetric configuration obtained for α = α * = 0 (when L ≤ L † ); (ii) a completely symmetric configuration obtained for α = α * = 1/2 (when L ≥ L † ).The critical value L † where the transition occurs is Typical phase diagrams are displayed in Figure 2. We clearly see a discontinuous transition between the symmetric and the asymmetric optimal configurations as the parameters of the model vary.This is a rather surprising result as it indicates that, under certain circumstances, the optimal solution is obtained by constructing a fast layer only on one side of the system.In other circumstances instead, a symmetric configuration is more advantageous than the asymmetric one.
The physical intuition behind this curious behavior is as follows.Constructing a fast layer requires an initial waste of resources, as only parts of the slow layer whose minimum cost to reach the center is at least r c take effective advantage of the fast layer.Such an initial investment consists of building a fast layer such that L ≥ r c for the asymmetric case, but L ≥ 2r c for the symmetric configuration.Hence, for r c ≤ L ≤ 2r c , the asymmetric configuration is trivially preferred over the symmetric one; however, the situation is not immediately inverted for L ≥ 2r c .As a matter of fact, any further extension of a branch of the fast layer leads to a reduction of the time to reach the center for all parts of the slow layer that are served by that branch.However, the objective function is subject to a diminishing return as the branch of the fast layer grows towards the boundary of the system.As L increases, the only branch of the asymmetric configuration grows twice as fast towards the boundary than the two branches of the symmetric configuration.Thus, the initial advantage of building an asymmetric fast layer over a symmetric one is still present for L ≥ 2r c , but the gap narrows as the size L of the fast layer increases.The critical value L † of Eq. ( 1) denotes the size of the fast layer when the two configurations generate identical reduction in travel time to the center and, for L ≥ L † the symmetric configuration is preferred over the asymmetric one.The diminishing-return property of the objective function explains also why the optimal configuration for L ≥ L † must be symmetric.If we alter in fact the symmetric configuration by reducing one branch in favor of the other, then the increase of the objective function induced by the reduction of the one branch will be larger than the decrease of the objective function induced by the extension of the other branch.Hence, by altering the symmetric configuration, we will necessarily increase the average time to reach the center for the overall system.
Although we have mathematical support for the above interpretation only in one-dimensional systems, we believe that the general principle of symmetry breaking applies to any network regardless of the dimension of the space where the network is embedded.Indeed, in our numerical experiments we do observe symmetry breaking in the geometry of the optimal configuration of the fast layer also in systems with dimension d > 1.We discuss these findings below.

B. Two-dimensional systems
We first extend our analysis to two-dimensional triangular lattices.The center o of the slow layer is identified by the site corresponding to the geometric center of the lattice and all other nodes in the layer are identified by lattice sites at distance at most R from such a center, see Methods for details.Due to the computational complexity of the optimization problem of Eq. ( 4), optimal configurations of the fast layer cannot be determined exactly in this case.We rely instead on the greedy optimization FIG. 2. Symmetry breaking in one-dimensional systems.(a) We consider a slow layer given by a segment of length 2R that extends symmetrically around its center (red triangle).The fast layer is given by a segment of length L, with a portion of length αL on the right of the center and a portion of length (1 − α)L on the left of the center, where 0 ≤ α ≤ 1/2.As we prove in the Methods section, two optimal configurations are possible for the fast layer: (i) a symmetric one, i.e., α = α * = 1/2, and (ii) an asymmetric one, i.e., α = α * = 0. (b) Optimal configuration of the fast layer as a function of the model parameter L and the switching cost c.Here the ratio of the speeds of the fast and the slow layers is η = 0.1.We distinguish three regions: (i) for L ≤ rc, with rc = 2c/(1 − η) as defined in Eq. ( 2) and represented by the dashed black line, the fast layer is not used; (ii) for rc ≤ L ≤ L † , with L † defined in Eq. ( 1) and represented by the white dashed curve, then α * = 0; (iii) for L ≥ L † , then α * = 1/2.(c) Same as in (b), but the optimal geometry of the fast layer is estimated as a function of c and η for L = 1.The black dashed line is identified by the condition rc = L; the white dashed line is given by Eq. ( 10).
strategy described in the SM.The submodularity of the objective function of Eq. ( 3) that we prove in the SM allows us to use this algorithm to generate, in a time that roughly grows as R 2.7 , approximate solutions to the optimization problem of Eq. ( 4) that are at most a factor (1 − 1/e) ≃ 0.63 above the ground-truth minimum [43].
Typical solutions obtained using greedy optimization are displayed in Figures 3a, b, and c.Here, we assume that the weight associated with each node n of the slow layer is p n =const.As discussed in the Methods section and proved in the SM, optimal configurations of the fast layer are given by trees with at least one edge incident to F (o).However, depending on the choice of the model parameters L, η and c, different optimal configurations may emerge.As for the one-dimensional case, also here optimal configurations of the fast layer appear to be characterized by branches of similar length, so that different optimal configurations can be distinguished by simply counting the number of such branches, namely k * as defined in Eq. ( 5).We do observe k * = 1, 2 and 3 in Figures 3a, b and c, respectively.Please note that this simple characterization of the geometry of the fast layer is valid only in the regime L < R. For larger sizes of the fast layer, the geometry of the optimal configurations becomes much richer and requires additional order parameters to be described; in this paper, we only consider phase transitions concerning fast layers whose size is much smaller than the one of the slow layer.
Typical phase diagrams are shown in Figures 3d, e,  and f.As for the one-dimensional continuous model, we observe that a fully asymmetric configuration emerges for large c values, see Figures 3d, e, and f; as L increases, we observe transitions towards larger number of branches, see Figure 3f.In Figure4d, we validate the goodness of the solutions obtained via greedy opti-mization by comparing them with solutions obtained via simulated-annealing optimization (see SM for details).
In the SM, we consider a continuous-space approximation of the two-dimensional lattice.The results of our analysis are qualitatively similar to those valid for the discrete lattice, with the only caveat that optimal configurations of the fast layer can be characterized by an unbounded number of branches.In the SM, we also consider two-dimensional lattices where the weight p n is an exponentially decreasing function of the lattice distance of node n to the center o. Results are qualitatively similar to those reported in Figure 3 in that setting too.

C. Higher-dimensional systems
Our findings on the breaking of the symmetry of the optimal fast layer generalize also to infinite-dimensional systems.
In the SM, we consider a continuous-space approximation of a star-like system where the slow layer is given by an arbitrary number q of segments intersecting in a single point and extending symmetrically around this central point.We can prove analytically that the only allowed solutions to our optimization problem are given by fast layers consisting of 1 ≤ n * ≤ q branches of identical length L/n * .As in the case of the one-and twodimensional systems, also for the star-like system we observe that n * = 1 for sufficiently large c values, and that n * grows as L increases.
The same qualitative behavior is also observed in numerical simulations on one instance of the Erdős-Rényi model with N = 1, 000 nodes and average degree ⟨k⟩ = 4.For simplicity, in our simulations, we select the node with the largest degree k max = 11 as the center of the slow layer.We then determine the optimal configuration of the fast layer via greedy optimization.We observe transitions between configurations of the optimal fast layer with variable number of branches 0 ≤ k * ≤ k max depending on the choice of the model parameters.Results of these simulations are reported in the SM.

IV. REAL-WORLD CITIES
In this section, we study the properties of the subway systems in Atlanta, Boston, and Toronto under the lens of our framework.We choose monocentric cities, fairly isolated from other major urban centers, with a tree-like subway structure.We identify the intersection point of the real subway lines in all three cases as the city center.We see that this point corresponds to the downtown area in the three cities.
First, we incorporate real population data in our model [44,45].We rely on a two-dimensional triangular lattice multiplex model; we use the population data and the appropriate coordinates reference systems to impose the triangular lattice structure onto the city landscape.Details on the data and modeling of the city population distribution can be found in the Methods section.We denote all quantities relative to the real physical system using the same notation as for the multiplex model, but we add a tilde on top of the corresponding symbol.For example, R indicates the radius of the lattice model, and R denotes the radius of the city.Overlaying a city on top of the triangular lattice allows us to associate a weight pn to each node n in the slow layer that reflects the real population density within the city.We use those weights in the objective function of Eq. ( 6), and then take advantage of the greedy algorithm to obtain approximate solutions to the optimization problem of Eq. ( 4).Similar to the previous sections, we obtain two classes of optimal fast-layer configurations for all the considered parameters.Results for the city of Toronto are displayed in Figure 4, where we see that optimal configurations comprise k * = 1 (Figure 4a) or k * = 2 (Figure 4b) branches.Similar results are valid for Atlanta and Boston, where we observe optimal configurations with k * ≤ 3 branches (see SM).For k * > 1, we note that the branches have no identical length; this is caused by the fact that the weight associated with the various nodes of the system is not constant.A typical phase diagram for Toronto is displayed in Figure 4c, where we fix η = 0.5, but vary L and c.The diagram is qualitatively similar to the one of Figure 3e.For fixed c, k * increases as L grows; however, for fixed L, k * decreases as c grows.The values of the parameters where the transitions between the various phases emerge differ from those of Figure 3e; this is due to the non-homogeneous density of the population used in the model of the city.Similar results for Atlanta and Boston can be found in the SM.Next, we perform a direct comparison between the real subway lines and the optimal fast-layer configurations obtained using our computational framework.To this end, we calibrate the model's parameters L and R such that the number of subway stations in the real system is comparable with the one in the model.Results of this analysis are reported in Figure 5 for Toronto and in the SM for Atlanta and Boston.We first note that the optimal fast-layer networks display additional ramifications.This is due to the fact that L > R in this experimental setting.Second and more important, we note that there is an overall good overlap between the real subways and those obtained under the framework.This is true regardless of the specific choice of the model parameters (Figures 5a and b).We quantify the efficiency of the real subway systems relative to the optimal configurations by measuring the ratio of the corresponding values of the objective function of Eq. ( 6), see Figure 5c.Here, we keep the speed of the fast layer invariant as ṽf = 40 km/h, and vary the speed of the slow layer ṽs .This corresponds to effectively varying the value of the model parameter η.The real subway system appears less efficient than the optimized one in congested situations when ṽs is small.However, it gets close to optimality as the speeds in the slow layer grows towards the value of the speed of the fast layer.

V. DISCUSSION
Location science and network design focused on practical aspects of network optimization.Even if operational research is successful in designing minimal cost solutions, the theoretical question of the optimal network topology is largely open.In addition, and as suspected in previous studies, we showed that these optimal networks experience a transition between different shapes when the total length or the switching cost increase: small variations of the cost can lead to strikingly dissimilar optimal structures.In particular, there is a transition characterized by a symmetry breaking leading to spatial inequalities.Such a phenomenon results from the interplay between switching cost and the absence of the network and shows that an optimal solution is not necessarily isotropic.Although such phenomenon is known to happen in various economic instances [46], it is the first time that it is exhibited in an optimal network context.These results also underscore the importance of considering switching costs and the cost associated with the slow layer (typically car traffic) when studying the optimal subway structures.A better theoretical understanding of these optimal shapes could certainly be helpful for practical applications and the identification of critical parameters.Further studies are however needed in order to explore in more depth these transitions.Also, we focused here on the monocentric case where we minimize the average distance to reach a central node.Large cities are however polycentric and the structure of flows is far more complex.Preliminary results suggest that here also there are transitions between different optimal shapes, but this point certainly deserves further studies.Finally, the more difficult problem of minimizing the average time needed to connect any pair of points is even more open.In this case, the optimal network can have loops and is computationally more demanding.Although we also expect transitions, our understanding of this case is at the beginning.

A. Multiplex transportation model
We consider a multiplex network composed of a slow layer and a fast layer (see Figure 1).We denote with G the set of nodes in the slow layer, and with S the set of its edges; H and F are respectively the set of nodes  6) for the optimized and the real configurations of the fast layer.As we vary ṽs, we change also the value of the parameter c in the multiplex model so that the switching time in the physical system is equal to 3 mins, see Methods for details.
and edges in the fast layer.Both layers contain N nodes; each node n in the slow layer has a one-to-one correspondence with a node F (n) in the fast layer.Each edge (F (n), F (m)) in the fast layer has a replica edge (n, m) in the slow layer, but the vice versa is not necessarily true.The transit time of each edge in the slow layer equals one, whereas time required to traverse edges of the fast layer is reduced by a factor 0 ≤ η ≤ 1. Replica nodes are connected to each other by edges whose transit time is c ≥ 0. Please note that in this mathematical framework entities have no physical meaning, thus we can interchange the notions of the length of an edge with that of the time required to traverse it, and simply refer to them with the generic term cost.
When considered in isolation, the slow layer forms a single connected component, whereas the fast layer is not necessarily connected.The connectedness of the slow layer implies, however, that in the overall system, composed of the interconnected slow and fast layers, there exists at least a path connecting any pair of nodes.The cost of a path in the network is given by the sum of the costs of all edges that compose the path.The minimumcost path between two nodes can either use edges in the slow layer only or take advantage of some of the edges in the fast layer (see Figure 1).In particular, the path We identify a special node o in the slow layer of the network, i.e., the center of the network.We denote with d n the cost of the fastest path of the generic node n to o.Also, we assume that each node n in the slow layer has an associated weight p n ≥ 0. We then define the weighted average cost to the center as We stress that the above function is computed over all nodes in the slow layer only, but eventual minimumcost paths can take advantage of edges in the fast layer.Clearly, τ depends on the various parameters of the model.In Eq. ( 3), we explicit, on purpose, only the dependence of τ on the fast layer F as this is the primary object of our investigation.We consider in fact the optimization problem aimed at finding the best set of edges in the fast layer able to minimize the objective function of Eq. ( 3).The minimization is constrained by the number of edges L that are in the fast layer, with L still measured in the same units of costs as τ and c.Specifically, we aim at solving where we indicated with |F| the number of edges in F.
Finding the exact solution to the optimization problem of Eq. ( 4) is computationally infeasible as it requires a brute-force search over all possible |S| L configurations of the fast layer.In the SM, we prove, however, that: (i) the optimal configuration of the fast layer is a connected tree with at least one edge incident to F (o), i.e., the replica node of the center; (ii) the objective function of Eq. ( 3) is a decreasing and submodular function.The relevance of property (i) is two-fold: first, it allows us to dramatically reduce the number of suitable solutions for the optimization problem of Eq. ( 4); second, it permits us to meaningfully describe the geometry of the optimal fast layer in terms of the number of branches departing from the replica node of the center, i.e., where δ x,y = 1 if x = y and δ x,y = 0 otherwise.Property (ii) allows us to leverage a greedy optimization scheme to generate approximate solutions to the optimization problem of Eq. ( 4) that are at most a fraction (1 − 1/e) above the ground-truth minimum [43].In the construction of greedy solutions, we start from an empty set of edges in the fast layer, and we add one edge at a time.The edge that is added is the one corresponding to the best choice that can be made given the current set of edges in the fast layer.In the SM, we further describe how solutions obtained via greedy optimization can be further refined to get better approximations for the optimization problem of Eq. ( 4); the quasi-optimality of our greedy solutions is validated by comparing them to those obtained via simulated-annealing optimization (see Figure 3 and SM for details).

B. Two-dimensional triangular lattices
The slow layer used in the definition of the multiplex transportation model can be represented by any connected network.In the SM for example, we report on results obtained for a slow layer given by an instance of an Erdős-Rényi model.
The vast majority of the results reported in this paper are obtained for slow layers derived from triangular lattices.The coordinates of all sites of the lattice are in the form (a 2 ) for integer values of a and b such that |a| + |b| ≤ R, where R is the radius of the triangular lattice.The boundary conditions give the system a hexagonal shape.The sites of the lattice are the nodes of the slow layer; each pair of nodes in the slow layer is connected if the corresponding sites are at distance one in the triangular lattice; we identify the center of the network as the site with coordinates (0, 0), i.e., a = b = 0.

C. Real-world cities
In this section, we describe how we model the transportation system of a real city.Our framework relies on the use of a multiplex network formed of two discrete triangular lattices, one used to describe slow transportation (e.g., cars) and the other used to model fast transportation (e.g., subways).As we are referring to a real physical system, all quantities that describe properties of the multiplex transportation model have an associated physical dimension.For simplicity, we still rely on the same notation as in the previous sections, however, we add a tilde on the top of the symbols to make clear that the notation is used to indicate physical quantities.For example, we use R to denote the city radius measured in units of length and distinguish it from R which serves to indicate the radius of the triangular lattice measured in dimensionless lattice units.

Data
We obtain the population density at the census-tract level for Boston and Atlanta from the 2021 American Census Survey [44], and for Toronto from the 2021 Canadian Census of Population [45].The census data contains the number of individuals residing in relatively small geographic regions, i.e., census tracts, used for statistical purposes by national statistical agencies.Census tracts typically consist of 2, 500 to 8, 000 individuals.
The data on the four metro lines and their stations in Boston is obtained from the Massachusetts Bureau of Geographic Information (MassGIS) website [47].Similar data for the four metro lines in Atlanta is obtained from the Atlanta Regional Commission: Open Data website [48].Finally, the data for the three subway lines and their stations in Toronto is made available by the Toronto Transit Commission at the City of Toronto Open Data website [49].The data on the subway lines is available as shapefiles.The arcs for the rail lines are given by sets of points in the coordinate reference systems (CRS) applicable to the geographic location.The CRSs used for Atlanta, Boston, and Toronto are EPSG:2239, EPSG:26986, and, EPSG:2952, respectively.Similarly, the location of the stations is given by points in the appropriate CRS.The Atlanta, Boston, and Toronto subway systems total L = 77 km, L = 109.6km, and L = 69.6 km of rail lines and ñs = 38, ñs = 125, and ñs = 75 stations, respectively.The average distance between the stations is 2.07 km, 0.88 km, and 0.94 km for Atlanta, Boston, and Toronto, respectively.
We choose a city radius R such that all stations are contained in the circle of radius R around the center.We find the appropriate choice for this radius to be R = 25 km in Atlanta and R = 20 km in Boston and Toronto.We assume that everything that lies inside this circular area constitutes the city.

Multiplex transportation model for cities
We create a lattice model of the transportation system in a city by overlaying a triangular lattice of radius R on top of the circle of radius R. Please note that we use R = 100 in all figures except for Figure 5 where we use R = 25.The operation requires matching locations of the city that are given by points in continuous space to lattice sites.To this end, we fix the location of the city center as the one of the center o of the triangular lattice.Since the lattice covers the entire circular area of the city, the physical distance between neighboring sites on the lattice is wn,m = R/R.The choice of R, for given R, determines the granularity of the lattice mesh overlaid on the city landscape.For instance, R = 100 and R = 25 km give us neighboring lattice sites n and m at distance wn,m = 0.25 km, whereas R = 25 and R = 25 km give wn,m = 1 km.Note that the physical distance between nodes n and m in the two layers is the same as the physical distance between their replica nodes F (n) and F (m) in the fast layer, i.e., for all (F (n), F (m)) ∈ F wF (n),F (m) = wn,m .The distance between the replica nodes n and F (n) is a parameter of the model wn,F (n) = l.
The weight pn associated with node n in the slow layer is given by the population density of the associated census tract containing the site.Note that the size of census tracts varies significantly as the population density varies.Consequently, depending on the choice of R and R, we may have no lattice sites in very small census tracts.We find that this issue can be resolved by choosing R = 100 for the values of R indicated above.
We assume that the travel speed ṽs on the slow layer is between 5 km/h and 20 km/h, and the speed on the fast layer is ṽf = 40 km/h.These are both realistic (ranges of) values for the travel speeds of cars and subways, respectively.Clearly, we have η = ṽs /ṽ f .We note that the time required to traverse the edge (n, m) ∈ S is tn,m = wn,m /ṽ s , whereas the time required to traverse the edge (F (n), F (m)) ∈ F is tF (n),F (m) = η tn,m .For example, if R = 25, R = 25 km, ṽs = 20 km/h, and ṽf = 40 km/h, we have tn,m = 1 20 hours or 3 minutes, and wF (n),F (m) = 1 40 hours or 1.5 minutes.Finally, we assume that a change of layers occurs also at speed ṽs , meaning that the time required to switch layers is t = l/ṽ s .
We denote with tn the time required to reach the center o from node n.This is given by the time corresponding to the fastest path connecting the two nodes.We finally rewrite Eq. (3) as τ (F) = n∈G tn pn n∈G pn and use this expression while solving the optimization problem of Eq. ( 4).

Estimating the average time to the center for real cities
Next, we explain the modeling framework used to obtain the results for the real subway systems in Figure 5.The slow layer is modeled identically as described above.We use ñs stations as the nodes of the fast layer.Connections between stations are given by actual railways, with length learned directly from the data.For each station, we identify its replica in the slow layer as the node corresponding to the closest (in terms of geographical distance) site of the triangular lattice.
To properly compare the objective function of Eq. ( 6) for the real city against the one obtained after greedy optimization, we set R = 25 when constructing the triangular lattice.This allows us to obtain comparable numbers of subway stations between the real cities and the synthetic ones.We respectively have L = 77, 132, and 87 for the synthetic versions of Atlanta, Boston, and Toronto.

D. Continuous-space approximation for one-dimensional lattices
To ease analytical calculations, we adopt a continuousspace approximation for a multiplex transportation model where the slow layer is given by a one-dimensional lattice where the weight associated to each node n in the slow layer is p n =const.Under the continuous-space approximation, the slow layer has the center located in the origin, and is formed of two segments of length R extending symmetrically to the left and the right of the center (see Figure 2a).The fast layer extends to the right of the origin with a segment of length αL and to the left with a segment of length (1 − α)L, where 0 ≤ α ≤ 1/2 is a tunable parameter and L ≤ R is the total length of the fast layer.The goal of our calculation is to find the value α * corresponding to the optimal configuration of the fast layer, i.e., the solution of the continuous-space approximation of the optimization problem of Eq. ( 4).
Consider first the right side of the fast layer which is serving the right side of the slow layer.The objective function relative to the right side of the system is If αL < r c , the fast layer does not serve any portion of the slow layer, thus leading to the second case of Eq. ( 7).We note that the term C appearing in Eq.( 8) does not depend on α, but only on r c and R.
We now distinguish two cases: (i) r c /L ≤ 1/2 and (ii) r c /L ≥ 1/2.In case (i), we can write: We see therefore that the function reaches its maximum at α = r c /L, and displays its minimum value either in α = 0 or α = 1/2.To determine where the minimum of the objective function of Eq. ( 9) is obtained, we need to solve the equation τ (α = 0) = τ (α = 1/2).After some simple calculations, we arrive to For r c ≥ r † the optimal configuration is the one obtained for α * = 1/2, whereas for r c ≤ r † the optimal configuration is the one corresponding to α * = 0.
In case (ii), we can repeat a similar derivation.We find that the maximum of the objective function is reached in α = 1 − r c /L, and the function displays its minimum value either in α = 0 or α = 1/2.Also, here we find the critical value of Eq. (10) where the optimal configuration of the fast layer changes from being perfectly symmetric to being asymmetric.Alternatively, we can determine the critical length L † of the fast layer as shown in Eq. (1).

SUPPLEMENTARY MATERIAL S1. MULTIPLEX TRANSPORTATION MODEL
involves two changes of layer (gray arrows), and a path in the fast layer (orange arrows).Its total cost is 3η + 2c, as each of the three edges used in the fast layer has cost equal to η, and 2c is the cost required to switch layer twice.The second path is more convenient than the first one as long as 3η + 2c < 2.
We consider a multiplex network composed of a slow layer and a fast layer.For a schematic example of such a system, see Fig. S1; for a list of variables/metrics used to characterize the model, see Table S1.We denote with G the set of nodes in the slow layer, and with H the set of nodes in the fast layer.Both layers contain N nodes, i.e., |G| = |H| = N .Each node in the slow layer has a one-to-one correspondence with a node in the fast layer; we indicate with F (•) the map between labels of nodes across layers, i.e., if n ∈ G then F (n) ∈ H, and vice versa.We denote with S and F the set of edges of the slow and the fast layer, respectively.Each edge in the fast layer has a replica edge in the slow layer, i.e., (F (n), F (m)) ∈ F implies (n, m) ∈ S, but the vice versa is not necessarily true.The weight of each edge in the slow layer equals one, i.e., w n,m = 1 for all (n, m) ∈ S, whereas the weight associated with the edges of the fast layer is equal to 0 ≤ η ≤ 1, i.e., w F (n),F (m) = η for all (F (n), F (m)) ∈ F. Replica nodes are connected to each other by edges of weight c ≥ 0, i.e., w n,F (n) = c for all n ∈ G.
When considered in isolation, the slow layer form a single connected component, whereas the fast layer is not necessarily connected.The connectedness of the slow layer implies, however, that in the overall system, composed of the interconnected slow and fast layers, there exists at least a path connecting any pair of nodes.The cost of a path, e.g., n → F (n) → . . .→ F (k) → k → . . .→ m, between two nodes n and m in the network is given by the sum of the weights of all edges that compose the path.We denote with d n→m the cost of the minimumcost path between nodes n and m.Naturally, the same definition of cost applies to paths between any pairs of nodes, belonging to either the slow or the fast layer.For example, d F (n)→m denotes the cost of the minimum-cost path between F (n) ∈ H and m ∈ G.The minimumcost path between two nodes can either use edges in the slow layer only, or take advantage of some of the edges in the fast layer (see Figure S1).In particular, the path n → F (n) → F (m) → . . .→ F (r) → F (s) → s composed of ℓ edges in the fast layer only is preferred to its replica path n → m → . . .→ r → s whenever ℓ is larger than the critical value

S2. OPTIMIZATION OF THE FAST LAYER
We identify a special node in the slow layer of the network, i.e., the center of the network.The label of the center is o.We denote with d n := d n→o the cost of the minimum-cost path of the generic node n to o.Also, we assume that each node n in the slow layer has associated a weight p n ≥ 0, representing the demand of node n.We then define the weighted average cost to the center as We stress that the above function is computed over all nodes in the slow layer only, but eventual minimumcost paths can take advantage of edges in the fast layer.Clearly, τ depends on the various parameters of the model.In Eq. (S2), we explicit, on purpose, only the dependence of τ on the fast layer F as this is the primary object of our investigation.We consider in fact the optimization problem aimed at finding the best set of edges in the fast layer able to minimize the objective function of Eq. (S2).The minimization is constrained by the number of edges L that are in the fast layer, where L is measured in the same units of cost as τ and c.Specifically, we aim at solving where we indicated with |F| the number of edges in F, and once more we did not write the explicit dependence of the objective function from the structure of the slow layer S, the center of the network o, the weights p n for optimal configuration of the fast layer, see Eq. ( 4) k * number of main branches of the optimal layer, see Eq. ( 5) TABLE S1.List of variables and metrics used in the description of the multiplex transportation model and its associated optimization problem.For each of them, we report the notation used thorough the document and the quantity that is meant to represent.
all nodes n ∈ G, and the values of the parameters η and c.
Finding the exact solution to the optimization problem of Eq. ( S3) is computationally infeasible as it requires to test all possible |S| L configurations of the fast layer.However, the number of suitable configurations for F * can be restricted by some properties that the optimal layer must satisfy.

S3. PROPERTIES OF THE OPTIMAL FAST LAYER
We show that the optimal fast layer F * , solution of the optimization problem of Eq. (S3), is a tree containing at least one edge incident to node F (o), i.e., the replica node of the center of the network.

Connectedness
Assume that the edges in the fast layer F generate a graph consisting of at least one component that has no edges incident to node F (o). Focus on one of these components, and indicate the set of its edges as C ⊆ F.
Consider now the edge (F (m), F (n)) = arg max (F (i),F (j))∈C max{d F (i) , d F (j) }, i.e., the edge in the component that corresponds to the largest value of the cost of the minimum-cost path to the center o.Without loss of generality, suppose that d F (m) ≥ d F (n) .Based on our premise, the minimum-cost path from node F (m) to o can be written as meaning there is at least one edge (r, q) ∈ S in the minimum-cost path to o such that (F (r), F (q)) / ∈ F. Define the set F ′ = F \ (F (m), F (n)) ∪ (F (r), F (q)), i.e., the same set as F but with the edge (F (m), F (n)) replaced by (F (r), F (q)).We have τ (F ′ ) ≤ τ (F).In fact, the cost of the minimum-cost path to o of every node n ∈ S whose minimum-cost path to o utilizes the edge (F (m), F (n)) when the fast layer is F will no increase when the fast layer is F ′ ; however, some of the other nodes that do not use the edge (F (m), F (n)) to reach o when the fast layer is F can decrease the cost of their minimum-cost path by using the edge (F (r), F (q)) when the fast layer is F ′ .
We note that, when passing from F to F ′ , the deletion of the edge (F (m), F (n)) does lead to any split of the component C, except for the potential removal of node F (m); however, the addition of the edge (F (r), F (q)) can lead to the merger of C with another component, as well as to the inclusion of an edge incident to F (o).
In summary, if the fast layer F is formed by at least one component that does not contain any edge incident to F (o), then we can always find another configuration of the fast layer that is better than F. The procedure can be iterated until its premise is no longer true.As a result, the optimal fast layer F * must contain only one component with at least one edge incident to F (o).

Tree structure
Suppose the fast layer F contains a loop.If the loop is formed by an odd number of edges, then there is an edge (F (n), F (m)) for which d F (n) = d F (m) .This edge is irrelevant for τ (F), as no minimum-cost path passes through it.In fact, suppose node i is such that d i→F (n) ≤ d i→F (m) .Then, we can write Similarly, if the loop has an even number of edges then there exists a node F (n) such that there are two minimum-cost paths from the node F (n) to o.One of the edges incident to F (n) in the loop can be removed without affecting τ (F).In both cases, the removed edge can be replaced by another edge potentially able to decrease τ , thus the optimal fast layer F * should not contain any loops.

S4. OPTIMIZATION TECHNIQUES
In the previous sections, we proved that the optimization problem of Eq. (S3) can be solved by looking only at fast-layer configurations consisting of trees that contain at least one edge incident to the replica of the center of the network.This fact dramatically reduces the number of potential configurations that one should look at, however, it does not address the computational unfeasibility of the optimization problem.In this section, we introduce numerical techniques able to approximate solutions to the problem in an efficient and effective manner.
in the temporary buffer.Then, we find the one with the maximum score, say (n, m).We simply set s max = s n,m (F g ), and repeat the same instructions as above to perform our lazy search.

Modified Dijkstra's algorithm
At step 2 of the greedy optimization algorithm, we mentioned that the components of the vector of the minimum-cost values ⃗ d are updated using a modified Dijkstra's algorithm.This procedure is used also to estimate potential changes to the vector ⃗ d, thus in the estimate of the scores s n,m (F g ) at step 4 of the greedy optimization.Our modified Dijkstra's algorithm is designed to perform local updates, as not all components of the vector ⃗ d will necessarily change after the new edge (F (i), F (j)) is added to F g .Without loss of generality, let's assume that before the addition of the edge, node F (i) has degree equal zero in the fast layer, whereas the degree of node F (j) is larger than zero.We first update d F (i) → min{d F (i) , d i + c, min (i,q)∈Fg d F (q) + η}.The three elements corresponds respectively to: (i) unchanged cost of the minimum-cost path, (ii) cost of the minimum-cost path reduced by taking the slow layer, (iii) cost of the minimum-cost path reduced by taking the fast layer.We then update the components of the other nodes using a Dijkstra-like algorithm.The algorithm is started from node F (i), and exploits edges in both S and F g leading therefore to updates in the vector ⃗ d that regards nodes in both the slow and the fast layer.However, it considers only moves q → p such that d q + w q,p ≤ d p ; in such a case, the component of the node p is updated as d p → d q + w q,p .The Dijkstra-like algorithm does not necessarily visit all nodes in the network, but only those whose component in the vector ⃗ d is decreased by the addition of the edge (F (i), F (j)).

Submodularity of the objective function
If the fast layer is composed of a single connected component, the score s n,m (F g ) associated with each potential edge (F (n), F (m)) that could be added to the set F g is a non-increasing function of the number of edges already added to F g .This fact follows from the simple observation that the only effect that the addition of the edge (F (n), F (m)) can have is adding novel minimum-cost paths in the network, potentially reducing the cost of the minimum-cost path of some nodes.The drop in the cost of the minimum-cost path induced by the addition of the edge (F (n), F (m)) is maximal when F g = ∅.If F g ̸ = ∅, however, the drop is potentially reduced given that the other edges already in the set can provide minimum-cost paths to the center that do not pass through the edge (F (n), F (m)).
As a matter of fact, we can write where F ′ ⊆ F ′′ , both F ′ and F ′′ are trees with at least one edge incident to F (o), and (F (n), F (m)) is an arbitrary edge that is incident to one edge in both F ′ and F ′′ .
In summary, the function τ that is optimized is a nonnegative, non-increasing, submodular function.As such, the proposed greedy algorithm allows us to find solutions that are at maximum a factor (1 − 1/e) ≃ 0.63 above the ground-truth optimum [43].
The only exception to the inequality (S6) is when both F (n) and F (m) are adjacent to edges in F ′′ .This is, however, not possible as the edge (F (n), F (m)) would close a loop, which is in contradiction with the fact that the optimal fast layer must be a tree.

Selecting greedy solutions
In the above formulation of the greedy optimization algorithm, we tacitly assumed that r c < 1, meaning that the addition of every edge to the fast layer can potentially reduce the value of the objective function.This fact allows us to initialize the algorithm with F g = ∅.
A simple way to re-use the previous algorithm for r c ≥ 1 is changing the initialization.Specifically, we can proceed by first identifying a node n ∈ G for which d n = ⌊r c ⌋ and one of the minimum-cost paths connecting n to o, and then adding (F (i), F (j)) to F g for each edge (i, j) ∈ S that is part of such a minimum-cost path.
We adopt a different protocol that can be used for any value of r c .Although the switching cost c is an input of the optimization problem, we treat it as a variable by considering M = 1, 000 different values in the interval [0, (1 − η)/2].For each of them, we find a solution using the greedy optimization algorithm, namely F .We compute the value of the objective function associated to each of these sets by using the input value of the switching cost c, and find the best solution by identifying the one corresponding to the smallest value of the objective function.

Complexity of the greedy algorithm
We show the computational complexity of the greedy algorithm after implementing the speed-ups described above.We use the two-dimensional multiplex transportation model introduced in the main manuscript with the radius of the triangular lattice equal to R. We generate greedy solutions for c = 0.1, η = 0.1, and L = R.The time required to obtain the greedy solution is plotted against the radius R in Figure S2.Note

Empty quarter
This is the simplest case where the fast layer is not used.We simply have Section of the circle spanning an angle ϕ with a fast layer of length ℓ The fast layer is a segment going from (0, 0) to (ℓ, 0).A generic point (r, θ), with 0 ≤ r ≤ R and 0 ≤ θ ≤ ϕ can reach the origin either using or not the fast layer.If it does not use it, then the cost of the minimum-cost path is q l = r.If it does use it, then the cost is q g = r 2 g + r 2 − 2r r g cos θ + ηr g + 2c .
The first term is the distance of (r, θ) to the point (r g , 0) on the fast layer that must be reached using the slow layer.The other two terms account for the cost of the path to the center on the fast layer and the penalty associated with the change of layers.We have that r g = min{ℓ, r × }, with

FIG. 3 .
FIG. 3. Symmetry breaking in two-dimensional systems.(a) Symmetric optimal configuration of the fast layer with k * = 3 branches.Here R = 25 and L = 12.(b) Symmetric optimal configuration of the fast layer with k * = 2 branches obtained for R = 25 and L = 12 (c) Asymmetric optimal configuration of the fast layer with k * = 1 branch valid for R = 25 and L = 12.(d) Average time to the center, i.e., Eq. (3) associated with the optimal fast-layer configuration as a function of c.Here, R = 25, L = 12 and η = 0.1.We compare solutions obtained using greedy and simulated-annealing optimization.The vertical dashed lines correspond to the values of c where we observe a change in the topology of the optimal fast layer.(e) Number of branches characterizing the topology of the optimal fast layer as a function of L and c.Here, R = 100 and η = 0.1.(f) Same as in (e), but as a function of η and c, with L = 50.

FIG. 4 .
FIG. 4. Symmetry breaking in the transportation networks of real cities.(a) We consider the city of Toronto.We construct the slow layer of the system using a triangular lattice radius R = 100; for the fast layer, we impose L = 50.The color map shows the population density associated with the lattice points; the gray lines represent census-tract boundaries.The red circle represents the center.We show the optimal configuration of the fast layer with k * = 1 branch.(b) Same as in (a), but we show the solution with k * = 2. (c) Phase diagram displaying the value of k * as a function of model parameters L and c.Here η = 0.5.The yellow region denotes L ≤ rc.

FIG. 5 .
FIG.5.Assessing the optimality of the transportation networks of real cities.(a) We compare the subway network generated with our optimization framework (white curves) with the real subway system (black curves) in the city of Toronto.The optimized configuration is obtained by setting R = 25, η = 0.5, and c = 1.25 in the lattice model.These choices correspond to setting, in the physical system, the speed of the slow and fast layers respectively to ṽs = 20 km/h and ṽf = 40 km/h, and the switching time between layers to 3 mins.(b) Same as in (a), but for ṽs = 5 km/h.We also set c = 0.3125 in the lattice model so that the switching time in the physical system still equals 3 mins.(c) The efficiency of the real subway systems relative to the optimal configurations as a function of the speed of the slow layer ṽs.Relative efficiency is given by the ratio between the values of the objective function estimated via Eq.(6) for the optimized and the real configurations of the fast layer.As we vary ṽs, we change also the value of the parameter c in the multiplex model so that the switching time in the physical system is equal to 3 mins, see Methods for details.
FIG. S1.Illustration of the multiplex transportation model.In the slow layer, the cost of an edge equals one; in the fast layer, the cost of an edge is reduced by the factor 0 ≤ η ≤ 1. Replica nodes across layers are connected by edges with cost c ≥ 0. Two possible paths connecting node n (blue circle) to node o (red circle) are highlighted.The path n → i → o has cost equal to 2 as it uses only two edges in the slow layer (green arrows).The second path, i.e., n → F (n) → F (j) → F (k) → F (o) → o, involves two changes of layer (gray arrows), and a path in the fast layer (orange arrows).Its total cost is 3η + 2c, as each of the three edges used in the fast layer has cost equal to η, and 2c is the cost required to switch layer twice.The second path is more convenient than the first one as long as 3η + 2c < 2.
in the slow layer S set of edges in the slow layer H set of nodes in the fast layer F set of edges in the fast layer N = |G| = |H| size of the network for all n ∈ G ⇔ F (n) ∈ H one-to-one map of nodes across layers wn,m = 1 for all (n, m) ∈ S weight of the edges in the slow layer w F (n),F (m) = η for all (F (n), F (m)) ∈ F weight of the edges in the fast layer w n,F (n) = c for all n ∈ G weight of interlayer edges or switching cost rc = 2c/(1 − η) critical cost of the model, see Eq. (2) dn→m for all n, m ∈ G ∪ H cost of the minimum-cost from node n to node m o ∈ G center of the network dn := dn→o for all n ∈ G ∪ H cost of the minimum-cost path of node n to o pn for all n ∈ G weight of nodes in the slow layer τ (F) objective function, see Eq. (3) sn,m(F) for all (n, m) ∈ S and (F (n), F (m)) / ∈ F marginal gain in the objective function, see Eq. (S4) F *

1 k * = 2 k * = 3 FIG
FIG. S4.Two-dimensional triangular lattices with exponentially decaying weights.(a) Optimal solution for the system with the fast layer composed of three branches.We use R = 25, c = 0.1, η = 0.1, and L = 12.The weight associated to the individual nodes in the slow layer is represented by the color map.(b) Same as in (a), but for c = 0.7.(c) Same as in (b), but for c = 1.2.

r × = arg min z z 2 +
r 2 − 2r z cos θ + ηz .(S17) If r g = ℓ, the minimum-cost path takes advantage of the entire length of the fast layer; if r g = r × ≤ ℓ, then only part of the fast layer is used by the minimum-cost path to the center.To compute r × , we just find the value for which d dz z 2 + r 2 − 2r z cos θ + ηz = 0 .After some calculations, we arrive to the expression r × = r cos θ − η 1 − η 2 sin θ .(S18)