Network extraction by routing optimization

Routing optimization is a relevant problem in many contexts. Solving directly this type of optimization problem is often computationally intractable. Recent studies suggest that one can instead turn this problem into one of solving a dynamical system of equations, which can instead be solved efficiently using numerical methods. This results in enabling the acquisition of optimal network topologies from a variety of routing problems. However, the actual extraction of the solution in terms of a final network topology relies on numerical details which can prevent an accurate investigation of their topological properties. In fact, in this context, theoretical results are fully accessible only to an expert audience and ready-to-use implementations for non-experts are rarely available or insufficiently documented. In particular, in this framework, final graph acquisition is a challenging problem in-and-of-itself. Here we introduce a method to extract network topologies from dynamical equations related to routing optimization under various parameters’ settings. Our method is made of three steps: first, it extracts an optimal trajectory by solving a dynamical system, then it pre-extracts a network, and finally, it filters out potential redundancies. Remarkably, we propose a principled model to address the filtering in the last step, and give a quantitative interpretation in terms of a transport-related cost function. This principled filtering can be applied to more general problems such as network extraction from images, thus going beyond the scenarios envisioned in the first step. Overall, this novel algorithm allows practitioners to easily extract optimal network topologies by combining basic tools from numerical methods, optimization and network theory. Thus, we provide an alternative to manual graph extraction which allows a grounded extraction from a large variety of optimal topologies. The analysis of these may open up the possibility to gain new insights into the structure and function of optimal networks. We provide an open source implementation of the code online.


Introduction
Investigating optimal network topologies is a relevant problem in several contexts, with applications varying from transportation networks [1][2][3][4] , communication systems [5][6][7] , biology 8,9 and ecology [10][11][12] .Depending on the specified objective function and set of constraints of a routing optimization problem 13 , optimal network topologies can be determined by different processes ranging from energy-minimizing tree-like structures ensuring steeper descent through a landscape as in river basins 10 to the opposite scenario of loopy structures that favor robustness to fluctuations and damage as in leaf venation 12,14 , the retina vascular system 15,16 or noise-cancelling networks 7 .In many applications, optimal networks can arise from an underlying process defined on a continuous space rather than a discrete network as in standard combinatorial optimization routing problems [17][18][19][20] .Optimal routing networks try to move resources by minimizing the transportation cost.This cost may be taken to be a function of the traveled distance, such as in Steiner trees, or proportional to the dissipated energy, such as optimal channel networks or resistance networks.The common denominator of these configurations is that they have a tree-like shape, i.e., optimal routing networks are loopless 1,21 .Recent developments in the mathematical theory of optimal transport 11,13 have proved that this is indeed the case and that complex fractal-like networks arise from branched optimal transport problems 22 .While the theory starts to consolidate, efficient numerical methods are still in a pre-development stage, in particular in the case of branched transport, where only a few results are present 23,24 , reflecting the obstacle that all these problems have an NP-hard genesis.Recent promising results 25,26 map a computationally hard optimization problem into finding the long-time behavior of a system of dynamic partial differential equations, the so-called Dynamic Monge-Kantorovich (DMK) approach, which is instead numerically accessible, computationally efficient, and leads to network shapes that resemble optimal structures 27 .Working in discretized continuous space, and in many network-based discretizations such as lattice-like networks as well, requires the use of threshold values for the identification of active network edges.This has the main consequence that there might be no obvious final resulting network, an output that would be trivial when starting from an underlying search space formed by predefined selected network structures.For example, the output of a numerically discretized (by, e.g., the Finite Element method) routing optimization problem in a 2D space is a real-valued

The routing optimization problem
In this section, we describe the main ideas and establish notation.We start by introducing the dynamical system of equations corresponding to the DMK routing optimization problem as proposed by Facca et al. [25][26][27] In these works, the authors first generalize the discrete dynamics of the slime mold Physarum Polycephalum (PP) to a continuous domain; then they conjecture that, like its discrete counterpart, its solution tends to an equilibrium point which is the solution of the Monge-Kantorovich optimal mass transport 42 as time goes to infinity.
We denote the space where the routing optimization problem is set as Ω ∈ R n , an open bounded domain that compactly contains f (x) = f + (x) − f − (x) ∈ R, the forcing function, describing the flow generating sources f + (x) and sinks f − (x).It is assumed that the system is isolated, i.e., no fluxes are entering or exiting the domain from the boundary.This imposes the constraint Ω f (x)dx = 0 to ensure mass balance.It is assumed that the flow is governed by a transient Fick-Poiseuille type flux q = −µ∇u, where µ(t, x), u(t, x) are called conductivity or transport density and transport potential, respectively.
The continuous set dynamical Monge-Kantorovich (DMK) equations are given by: where ∇ = ∇ x .Equation (1) states the spatial balance of the Fick-Poiseuille flux and is complemented by no-flow Neumann boundary conditions; Eq. (2) enforces the system dynamics in analogy with the discrete PP model and Eq. ( 3) provides the initial configuration of the system.The parameter β captures different routing transportation mechanisms.A value of β < 1 enforces optimal solutions to avoid traffic congestion; β = 1 is shortest path-like; while β > 1 encourages consolidating the flow so to use a smaller amount of network-like infrastructure, and is related to branched transport 11,43 .Within a network-like interpretation, qualitatively, µ(x,t) describes the capacity of the network edges 44 .Thus, its initial distribution µ 0 (x) describes how the initial capacities are distributed.
In this work, solving the routing optimization problem consists of finding the steady state solution (µ * , u * ) : Ω → R ≥0 × R of Eq. ( 1), i.e. (µ * (x), u * (x)) = lim t→+∞ (µ(t, x), u(t, x)).Numerical solution of the above model can be obtained by means of a double discretization in time and space [25][26][27] .The resulting solver (called from now on DMK-Solver) has been shown to be efficient, robust and capable of identifying the typically singular structures that arise from the original problem.In Fig. 1, some visual examples of the numerical µ * obtained for different values of β are shown.The same authors showed that the DMK-Solver is able to emulate the results for the discrete formulation of the PP model proposed by Tero et al. 45 Under appropriate regularity  assumptions, it can be shown 26,27 that the equilibrium solution of the above problem (µ * (x), u * (x)) is a minimizer of the following functional: where P(β ) = (2 − β )/β .In words, this functional is the sum of the total energy dissipated during transport (the first term is the Dirichlet energy corresponding to the solution of the first PDE) plus a nonlinear (sub-additive) function of the total capacity of the system at equilibrium.In terms of costs, this functional can be interpreted as the cost of transport, assumed to be proportional to the total dissipated energy, and the cost of building the transport infrastructure, assumed to be a nonlinear function (with power 2 − β ) of the total transport capacity of the system.We exploit the robustness of this numerical solver to extract the solutions of DMK equations corresponding to various routing optimization problems.We here focus on the case β ≥ 1, where the approximate support of µ * displays a network-like structure.This is the first step of our extraction pipeline, which we denote as DMK-Solver.The numerical solution of these equations does not allow for a straightforward network representation.Indeed, depending on various numerical details related to the spatial discretization and other parameters, one usually obtains a visually well-defined network structure (see Fig. 1a) whose rendering as a graph object is however uncertain and non-unique.This in turns can hinder a proper investigation of the topological properties associated to optimal routing optimization problems, motivating the main contribution of our work: the proposal of a graph extraction pipeline to automatically and robustly extract network topologies from the solutions' output of DMK-Solver.We reinforce that our contribution is not limited to this application, but is also able to extract network-like shapes from any kind of image where a color or greyscale thresholds can be used to identify the sought structure.
Our extraction pipeline then proceeds with two main steps: pre-extraction and graph filtering.The first one tackles the problem of translating a solution from the continuous scenario into a graph structure, while the second one addresses the problem of removing redundant graph structure resulting from the previous step.A pseudo-code of the overall pipeline is provided in Algorithm 1 46 .
Algorithm 1 Generating optimal networks from solutions of dynamical systems: the pipeline

Graph preliminary extraction
In this section, we expand on the graph pre-extraction step: extracting a network representation from the numerical solution output of the DMK-Solver.This involves a combination of numerical methods for discretizing the space and translating the values of µ * , and u * into edge weights of an auxiliary network, which we denote as G = (V, E,W ), where V is the set of nodes, E the set of edges and W the set of weights.
The DMK solver outputs the solution on a triangulation of the domain Ω (here also named grid) and denoted as ∆ Ω = {T i } i , with ∪T i = Ω .The numerical solution, piecewise constant on each triangle T i , is considered assigned to the triangle barycenter (center of gravity) at position b i = (x i , y i ) ∈ Ω 47 .This means that the result is a set of pairs {(µ * (b i ), u * (b i ))} i .We can track any function of these two quantities.For simplicity, we use µ * (see Fig. 1 for various examples), but one could use u * or a function of these two.This choice does not affect the procedure, although the resulting network might be different.We neglect information on the triangles where the solution is smaller than a user-specified threshold δ ∈ R ≥0 , in order to work only with the most relevant information.Formally, we only keep the information on T i such that µ * (b i ) ≥ δ .We observed empirically that in many cases, several triangles contain a value of µ * that is orders of magnitude smaller than others, see for instance the scale of Fig. 1.Since we want to build a network that connects these barycenters, we remark that this procedure depends on the choice of the threshold δ : if δ 1 < δ 2 , then G(δ 2 ) ⊂ G(δ 1 ).On one hand, the smaller δ , the more likely G is to be connected, but at the cost of containing many possibly loop-forming edges and nodes (the extreme case δ = 0 uses the whole grid to build the final network); on the other hand, the higher δ , the smaller the final network is (both in terms of the number of nodes and edges).Thus one needs to tune the parameter δ such that resulting paths from sources to sinks are connected while avoiding the inclusion of redundant information.
The set of relevant triangles does not correspond to a straightforward meaningful network structure, i.e. a set of nodes and edges connecting neighboring nodes.In fact, we want to remove as much as possible the biases introduced by the underlying triangulation and thus we start by connecting the triangle barycenters.For this, we need rules for defining nodes, edges and weights on the edges.Here, we propose three methods for defining the graph nodes and edges and two functions to assign the weights.The overall graph pre-extraction routine is given by choosing one of the former and one of the latter, and it can be applied also to more general inputs beyond solutions of the DMK-Solver.

Rules for selecting nodes and edges
Selecting V and E requires defining the neighborhood σ (T i ) of a triangle in the original triangulation ∆ Ω (for i such that µ * (b i ) ≥ δ ).We consider three different procedures: (I) Edge-or-node sharing: σ (T i ) is the set of triangles that either share a grid edge or a grid node with T i .

4/17
(II) Edge-only sharing: σ (T i ) is the set of triangles that share a grid edge with T i .Note that |σ (T i )| ≤ 3, ∀i.
(III) Original triangulation: let v, w, s be the grid nodes of T i ; then add v, w, s to V and (v, w), (w, s), (s, v) to E.
It is worth mentioning that since the grid ∆ Ω is non uniform and µ * is not constant, we cannot control a priori the degree d i of a node i in the graph G generated for a particular threshold δ .We give examples of networks resulting from these three definitions in Fig. 2 and a pseudo-code for the first two in Algorithm 2.

Rules for selecting weights
The weights w i j to be assigned to edges e i j := (i, j) ∈ E should be a function of µ(b i ) and µ(b j ), the density contained in the original triangles.We consider two possibilities for this function: (ii) Effective reweighing (ER): While using the average as in (i) captures the intuition, it may overestimate the contribution of a triangle when this has more than one neighbor in G with the risk of calculating a total density larger than the original output of the DMK-Solver.To avoid this issue, we consider an effective reweighing as in (ii), where each triangle contribution by the degree d i = |σ i | of a node i ∈ V is reweighted, with σ i the set of neighbors of i.This guarantees the recovery of the density obtained from DMK-Solver, since where in the sum we neglected isolated nodes, i.e. i s.t.d i = 0. Note that in the case of choosing the original triangulation for node and edge selection (case (III) above), the ER rule does not apply; in that case, we use AVG.We monitor the conductivity µ and use parameters µ 0 = 1, β = 1.02, δ = 0.0001.Weights w i j are chosen with AVG (i), f is chosen such that sources and sinks are inside green and red rectangles respectively.

5/17 3 Graph filtering
The output of the graph extraction step is a network closer to our expectation of obtaining an optimal network topology resulting from a routing optimization problem.However, this network may contain redundant structures like dangling nodes or small irrelevant loops (see Fig. 2).These are not related to any intrinsic property of optimality, but rather are a feature of the discretization procedure resulting from the graph pre-extraction step.It is thus important to filter the network by removing these redundant parts.However, how to perform this removal in an automated and principled way is not an obvious task.One has to be careful in removing enough structure, while not compromising the core optimality properties of the network.This removal is then a problem in-and-of-itself, we name it graph filtering step.We now proceed by explaining how we tackle it in a principled way and discuss its quantitative interpretation in terms of minimizing a cost function interpolating between an operating and an infrastructural cost.

The Discrete DMK-Solver
Going beyond heuristics and inspired by the problem presented in Sec. 1, we consider as a solution for the graph filtering step, the implementation of a second routing optimization algorithm to the network G output of the pre-extraction step, i.e. in discrete space.Several choices for this could be drawn, for instance, from routing optimization literature 48 , but we need to make sure that this second optimization step does not modify any of the intrinsic properties related to optimality resulting from the DMK-Solver.We thus propose to use a discrete version of the DMK-Solver (discrete-DMK-Solver).This was proven to be related to the Basis Pursuit (BP) optimization problem 49 .In fact, BP is related 50 to the PP dynamical problem in discrete space and the discrete-DMK-Solver gives a solution to the PP in discrete space 49 .The discretization results in a reduction of the computational costs for solutions of BP problems, compared to standard combinatorial optimization approaches 49 .Being an adaptation to discrete settings of our original optimization problem, it is a natural candidate for a graph filtering step, preserving the solution's properties.
The problem is stated as follows.Consider the signed incidence matrix B ∈ M N×M of a weighted graph G = (V, E,W ), with entries B ie = ±1 if the edge e has node i as start/end point, 0 otherwise; N = |V | and M = |E|.Denote = { e } e the vector of edge lengths, f a N-dim vector of source-sink values with entries satisfying ∑ i∈V f i = 0; this is the discrete analogues of the source-sink function f (x) introduced in Section 1; the functions µ(t) ∈ R M and u(t) ∈ R N correspond to the conductivity and potential respectively, similarly to the continuous case, but this time they are vectors with entries µ e (t) and u i (t) defined on edges and nodes respectively.The PP discrete dynamics corresponding to the original routing optimization problem can be written as: e ∑ j B e j u j (t) , where | • | is the absolute value element-wise.Equation ( 5) corresponds to Kirchoff's law, Eq. ( 6) is the discrete dynamics with β d a parameter controlling for different routing optimization mechanisms (analogously to β in Eq. 2); Eq. ( 7) is the initial condition.The importance of this systems stems in having an interesting theoretical correspondence: its equilibrium point corresponds to the minimizer of a cost function analogous to Eq. ( 4) that, similarly to the continuous case, can be interpreted as global energy functional.This is: e ∑ j B e j u j (µ(t)) where P(β ) = 2 − β /β and u(µ(t)) is a function implicitly defined as the solution of Eq. ( 5).The first term corresponds to the energy dissipated during transport, it can be interpreted as the operating costs, whereas the second is the infrastructural cost.
The equilibrium point of µ e (t) is stationary at the previous energy function, and for β d = 1 it acts also as the global minimizer due to its convexity.For β d > 1 the energy is not convex, thus in general the functional will present several local minima towards which the dynamics will be attracted.The case β d < 1 does not act as a filter because it encourages trajectories to spread through the network, instead of removing edges, and so not interesting to our purposes.Discretization in time of Eq. ( 6) by the implicit Euler scheme combined with Newton method leads to an efficient numerical solver, see Facca et al. 49 for more details.The above scheme gives the solution to the BP problem and represents the discrete-DMK-Solver.Similarly to the graph pre-extraction step, the filtering is also valid beyond networks related to solutions of the DMK-Solver.It applies to more general inputs if defined on a discrete space, for instance, images.Finally, notice that the filter generates a graph with a new set of nodes and edges, both subsets of the corresponding ones in G, result of the pre-extraction.The weights of the final graph can then be assigned with same rules as in 2.1; in addition, one can consider as weights the values of µ e resulting from the BP problem (we named this weighing method "BPW").Alternatively, one can ignore the weights of BP and keep (for the edges remaining after the filter) the weights as in the previous pre-extraction step (labeled as "IBP").Figure 3 shows an example of three filtering settings on the same input.

Selecting sources and sinks
The discrete-DMK-Solver requires in input a set of source and sink nodes (S + and S − ) that identify the support of the forcing vector f introduced in 3.1.However, the graph pre-extraction output G might contain redundant nodes (or edges) as mentioned before.In principle, among the nodes i ∈ V , all of those contained in the support of f (x = b i ), i.e. contained in the supports of sources and sinks of the original routing optimization problem in Eq. ( 1), are eligible to be treated as sources or sink in the resulting network.However, several paths connecting source and sink nodes may be redundant and clearly not compatible with an optimal routing network (see Supplementary Fig. S2 for such an example).Therefore, it is important to select "representatives" for sources and sinks, such that the final network is heuristically closer to optimality.Here we propose a criterion to select source and sink nodes from the eligible ones in each of the connected components {C m } m of G, using a combination of two network properties.Starting from the complete graph formed by all the nodes characterized by a significant (above the threshold) density, source and sink nodes and rates are defined as follows.A node i ∈ S + , i.e. is a source f i > 0, if either i) is in the convex hull of the set of eligible sources or ii) its betweenness centrality is smaller than a given threshold τ BC .Similarly for sink nodes in S − .This is because, on one side, nodes in the convex hull capture the outer shape structure of the source and sink sets defined in the continuous problem; on the other side, nodes with small values of the betweenness centrality capture the end-points of G inside the source and sink sets, analogously to leaves (i.e., degree-one nodes) 51 .We present these ideas in more detail in the Supplementary Fig. S2.Once we have identified the sets of source and sink vertices, we need to assign a proper value f i such that Kirchhoff law is satisfied in each of the different connected components C m .It is reasonable to assume that each connected component is "closed", i.e. ∑ i∈C m f i = 0 , ∀C m .Denoting with |S| the number of elements in a set S and V (C m ) the set of nodes in C m , we then distribute the mass-fluxes uniformly by setting sinks ( f i = 0 otherwise) so that the total original source and sink flux is assigned to the overall source/sink nodes of all C m .Note that this procedure maintains the overall system and each connected component "closed", as stated above.

Model validation
Our extraction pipeline proceeds by compressing routing information in the raw output of the DMK-Solver (although what follows is not restricted to this case) on a lean network structure.This might lead us to lose relevant information in the process.Hence, we need to devise a posteriori estimates that provide quantitative guidance on the "leanness" and information loss of the final network.Here we propose metrics to evaluate the compression performance of the various graph pre-extraction and filtering protocols.The raw information is made of a set of weights w(T i ) representing the values (µ * , u * ) on each of the triangles T i ∈ Ω .We consider as the truth benchmark the distribution of w, or any other quantity of interest, supported on the subgrid ∆ δ Ω ⊂ ∆ Ω formed by all triangles where w is larger than the threshold value δ , i.e., ∆ δ Ω := {T i ∈ ∆ Ω : w(T i ) ≥ δ }.We expect that a good compression scheme should preserve both the total amount of the weights from the original solution in ∆ δ Ω and the information of where these weights are located inside the domain Ω .Also, we want this compression to be parsimonious, i.e. to store the least amount of information as possible.We test against these two requirements by proposing two metrics that measure: i) an information difference between the raw output of the DMK-Solver and the network extracted using our procedure, capturing the information of where the weights are located in space; ii) the amount of information needed to store the network.
Our first proposed metric relies in partitioning Ω in several subsets and then calculating the difference in the extracted network weights and the uncompressed output, locally within each subset.More precisely, we partition Ω into P non intersecting subsets C α ⊂ Ω , with α = 1, . . ., P and ∪ P 1 C α = Ω .For example, we define C α = [x i , x i+1 ] × [y j , y j+1 ], for x i , x i+1 , y j and y j+1 , consecutive elements of N-regular partitions of [0, 1], and P = (N − 1) 2 .Denote with w δ (T i ) the weight on the triangle T i ∈ ∆ δ Ω , resulting from the DMK-Solver (usually a function of µ * and u * ).If we denote the local weight of ∆ δ Ω inside C α as w α = ∑ i:b i ∈C α w δ (T i ), then we propose the following evaluation metric: where I α (e) is an indicator of whether an edge e = (i, j) ∈ E is inside an element C α of the partition, i.e.I α (e) = 1, 0, 1/2 if both b i , b j are in C α , none of them are, or only one of them is, respectively.In words, ŵq (G) is a distance between the weights of the network extracted by our procedure and the original weights output of the DMK-Solver, over each of the local subsets C α .This metric penalizes networks that either place large-weight edges where they were not present in the original triangulation, or low-weight ones where they were instead present originally.In this work we consider the Euclidean distance, i.e. q = 2, but other choices are also possible.Note that ŵq (G) does not say anything about how much information was required to store the processed network.If we want to encourage parsimonious networks, i.e. networks with few redundant structures, then we should include in the evaluation the monitoring of L(G) = ∑ e∈E e , the total path length of the compressed network, where the edge length e can be specified based on the application.Standard choices are uniform e = 1, ∀e or the Euclidean distance between b i and b j .Intuitively, networks with small values of both ŵq (G) and L(G) are both accurate and parsimonious representations of the original DMK solutions defined on the triangulation.We evaluate numerous graph extraction pipelines in terms of these two metrics on various routing optimization problem settings and parameters.In Fig. 4 we show the main results for a distribution of 170 networks obtained with β ∈ {1.1, 1.2, 1.3} and β d = 1.1.Similar results were obtained for other parameter settings.Networks are generated as follows: first, we choose a set of 5 different initial transport densities µ 0 , grouped in parabola-like, delta-like and uniform distributions, and a set of 12 different configurations for sources/sinks (mainly rectangles placed in different positions along the domain, see Supplementary Information for more details).Then, for each of these setup, we run our procedure: i) first the DMK-Solver calculates the solution of the continuous problems; ii) then we apply the graph pre-extraction procedure according to the rules of Sec.2.1 and weights as in Sec.2.2; iii) finally, we run the graph filtering step and consider various weight functions, as described in Fig. 4. We observe that not applying the final filtering step and considering rule I with ER to build the graph (I-ER-None), the values of ŵ2 (G) are smaller than other cases.This is expected as by filtering we remove information and thus achieve better performance with this metric when compared to no filtering.However, we pay a price in terms of total relative length as L(G)/L max is larger for this case.When working with rule II, we notice the appearance of many non-optimal small disconnected components and this effect deteriorates if filtering is activated.Corresponding statistics show low values for both ŵ2 (G) and L(G)/L max .We argue that this is because rule II produces, by construction, fewer redundant objects than rule I in the initial phase.This might have a similar effect as a filter but is done a priori during the pre-extraction, because rule II produces in this phase a limited number of effective neighbors.However, this comes at a price of higher variability with the sampled networks, as the variance of ŵ2 (G) is higher than for the other combinations.Among the possibilities with filtering applied, we observe that rule I performs better than rule III, while all the weighting rules give a similar performance in terms of both metrics.Any combination involving rule I plus filtering has a similar performance as rule II in terms of both metrics but with smaller variability.Finally, these combinations perform differently in terms of the number of disconnected components (not  9); (right) total network length L(G) normalized by L max , the max length over the 170 networks.Each bar denotes a possible combination as follows: roman numbers denote one of the three rules I-III (2.1); first label after the number denotes one of the rules to assign weights i-ii 2.2), which is applied to the output of the first step; the second (and last) label denotes the same rule but applied after the filtering step, "None" means that nothing is done, i.e. no filter applied, "IBP" means filter applied but with no reweighing, i.e. when an edge is removed by the filter we simply lose information without relocating its weight.Bars are color-coded so that rules I-III have three different primary colors and their corresponding routines have different shades of that main color.Here, we keep track of the conductivity µ and show medians and quartiles of a distribution over 170 networks generated with β ∈ {1.1, 1.2, 1.3}, β d = 1.1 and δ = 0.01.
shown here), with rule II producing more spurious splittings, as already mentioned.Depending on the application at hand, a practitioner should select one of these combinations based on their properties as discussed in this section.We give an example of a network generated with I-ER-ER in Fig. 5.

Application: network analysis of a vein network
We demonstrate our protocol on a biological network of fungi foraging for resources in space.The network structure corresponds to the fungi response to food cues while foraging 52 .Edges are veins or venules and connect adjacent nodes.This and those of other types of fungi are well known networks typically studied using image segmentation methods [28][29][30][31] .It is thus interesting to compare results found by these techniques and by our approach, under the conjecture that the underlying dynamic driving the network structure could be the same as the optimality principles guiding our extraction pipeline.In particular, we are interested in analyzing the distribution P( ) of the veins lengths, i.e. the network edges.The benchmark P( ) distribution obtained by Baumgarten and Hauser 28 using image processing techniques is an exponential of the type P( ) = P 0 e −γ .Accordingly, as shown in Fig. 6, we find that an exponential fit (with values P 0 = 234.00,γ = 36.32)well captures the left part of the distribution, i.e. short edges.Differences between fit and observed data can be seen in the right-most tail of the graph, corresponding to longer path lengths, where the data decay faster than the fit.However, we find that the exponential fit is nevertheless better than other distributions, such as the gamma and log-normal proposed in Dirnberger and Mehlhorn 53 for the P. polycephalum.Drawing definite quantitative conclusions is beyond the scope of our work, as this example aims at a qualitative illustration of possible applications that can be addressed with our model.In general, however, it seems not possible to choose a single distribution that well fits both center and tails of the distribution for various datasets of this type 53 .To conclude, we demonstrate the flexibility of our graph extraction method on a more general input than the one extracted from DMK-Solver.Specifically, we consider as example an image of P. polycephalum taken from data publicly available in the Slime Mold Graph Repository (SMGR) repository 54 .We first downsample an image of the SMGR's KIST Europe data set, using OpenCV (left) and a color scale defined on the pixels as an artificial µ * function.We build a graph using the graph pre-extraction and graph filtering steps as shown in Fig. 7. Notice that our protocol in its standard settings with filtering can only generate tree-like structures.Therefore, if we want to obtain a network with loops as we did in Fig. 7, we should consider a modification of our routine, which can be done in a fully automatized way, as explained in more details in the Supplementary S4.In short, after the graph pre-extraction step, where loops are still present, we extract a tree-like structure close to the original loopy graph and give this in input to the filtering.We can then add a posteriori edges that connect terminals that were

Discussion
We propose a graph extraction method for processing raw solutions of routing optimization problems in continuous space into interpretable network topologies.The goal is to provide a valuable tool to help practitioners bridging the gap between abstract mathematical principles behind optimal transport theory and more interpretable and concrete principles of network theory.While the underlying routing optimization scheme behind the first step of our routine uses recent advances of optimal transport theory, our tool enables automatic graph extraction without requiring expert knowledge.We purposely provide a flexible routine We take an image of the P. Polycephalum from the SMGR repository 54 .The picture used is a 1200x1200-pixel section of an original image of size 5184x3456 pixels (see Supplementary S4 for details) and extract a network with step 2 and 3 of our protocol.As a pre-processing step, we downsample the image using OpenCV (left) and use the color scale defined on the pixels as an artificial µ * function.Using this information and the grid structure associated to the image's pixels, we first build a graph G with the graph pre-extraction step described in Sec.2; then, we obtain a graph G f (right) using the graph filtering step of Sec. 3, for an appropriate selection of sources and sinks, and adding a correction to retrieve loops.Notice that our protocol in its standard settings with filtering can only generate tree-like structures.Therefore, if we want to obtain a network with loops, we should consider a minor modification of our routine, which can be done in a fully automatized way, as explained in more details in the Supplementary S4.
for graph extraction so that it can be easily adapted to serve the specific needs of practitioners from a wider interdisciplinary audience.We thus encourage users to choose the parameters and details of the subroutines to suitably customize the protocol based on the application of interest.To help guiding this choice, we provide several examples here and in the Supplementary Information.We anticipate that this work will find applications beyond that of automating graph extraction from routing optimization problems.We remark that two of the three steps of our protocol apply to inputs that might not necessarily come from solutions of routing optimization.Indeed, the pipeline can be applied to any image setting where an underlying network needs to be extracted.The advantage of our setting with respect to more conventional machine learning methods is that the final structure extracted with our approach minimizes a clearly defined energy functional, that can be interpreted as the combination of the total dissipated energy during transport and the cost of building the transport infrastructure.We foresee that this minimizing interpretation together with the simplification of the pipeline from abstract modeling to final concrete network outputs will foster cross-breeding between fields as our tool will inform network science with optimal transport principles and vice-versa.In addition, we expect to advance the field of network science by promoting the creation of new network databases related to routing optimization problems.

Figure 1 .
Figure 1.Different values of β in Eq. (2) lead to different settings of a routing optimization problem.Colors denote different intensities of conductivity µ as described by the color bar on the left.(a) β < 1 enforces avoiding mass congestion; (b) β = 1 is shortest path-like, the mass goes straight from source to sink; (c) β > 1 encourages traffic consolidation.Red rectangle denotes the sink, green ones the four sources.

Figure 3 .
Figure 3. Graph filtering rules.Left): β d = 1.0; center): β d = 1.4; right): β d = 1.8.The number on top denote the contributions of operating and infrastructural cost to the energy.Green and red dots represent sources and sinks respectively (τ BC = 10 −1 ); blue edges are those e with µ * e ≥ 10 −3 .The filtering input is generated from DMK-Solver with β = 1.05.The apparent lack of symmetry of the network's branches is due to numerical discretization of the domain, solver and threshold δ .As the relative size of the terminal set decreases compared to the size of remaining part of the domain, this lack of symmetry becomes negligible.

Figure 4 .
Figure 4. Graph extraction performance evaluation.We plot results for the different combinations of the graph extraction rule in terms of: (left) the metric ŵq (G) of Eq. (9); (right) total network length L(G) normalized by L max , the max length over the 170 networks.Each bar denotes a possible combination as follows: roman numbers denote one of the three rules I-III (2.1); first label after the number denotes one of the rules to assign weights i-ii 2.2), which is applied to the output of the first step; the second (and last) label denotes the same rule but applied after the filtering step, "None" means that nothing is done, i.e. no filter applied, "IBP" means filter applied but with no reweighing, i.e. when an edge is removed by the filter we simply lose information without relocating its weight.Bars are color-coded so that rules I-III have three different primary colors and their corresponding routines have different shades of that main color.Here, we keep track of the conductivity µ and show medians and quartiles of a distribution over 170 networks generated with β ∈ {1.1, 1.2, 1.3}, β d = 1.1 and δ = 0.01.

Figure 5 .
Figure 5. Network extraction example.We show a network generated from a routing optimization problem with parameters β c = 1.4,β d = 1.3 and δ = 0.001; (left) raw output of the DMK-Solver; (right) final network extracted using the routine I-ER-ER.

Figure 7 .
Figure 7. Application to images.We take an image of the P. Polycephalum from the SMGR repository54 .The picture used is a 1200x1200-pixel section of an original image of size 5184x3456 pixels (see Supplementary S4 for details) and extract a network with step 2 and 3 of our protocol.As a pre-processing step, we downsample the image using OpenCV (left) and use the color scale defined on the pixels as an artificial µ * function.Using this information and the grid structure associated to the image's pixels, we first build a graph G with the graph pre-extraction step described in Sec.2; then, we obtain a graph G f (right) using the graph filtering step of Sec. 3, for an appropriate selection of sources and sinks, and adding a correction to retrieve loops.Notice that our protocol in its standard settings with filtering can only generate tree-like structures.Therefore, if we want to obtain a network with loops, we should consider a minor modification of our routine, which can be done in a fully automatized way, as explained in more details in the Supplementary S4.
: parameters to set up network extraction problem: i) Set the space Ω : {T i } i grid triangulation and mesh-related parameters.ii)Set up routing optimization problem: β ≥ 1, µ 0 , f + , f − and threshold δ .iii)Set up the discrete routing optimization (for graph filtering): β d , δ d and τ BC .Our final goal is to translate the solution pair (µ * , u * ) into a proper network structure using several techniques from graph theory.With these networks at hand, a practitioner is then able to investigate topologies associated with this novel representation of routing optimization solutions.