DeFiNe: an optimisation-based method for robust disentangling of filamentous networks

Thread-like structures are pervasive across scales, from polymeric proteins to root systems to galaxy filaments, and their characteristics can be readily investigated in the network formalism. Yet, network links usually represent only parts of filaments, which, when neglected, may lead to erroneous conclusions from network-based analyses. The existing alternatives to detect filaments in network representations require tuning of parameters over a large range of values and treat all filaments equally, thus, precluding automated analysis of diverse filamentous systems. Here, we propose a fully automated and robust optimisation-based approach to detect filaments of consistent intensities and angles in a given network. We test and demonstrate the accuracy of our solution with contrived, biological, and cosmic filamentous structures. In particular, we show that the proposed approach provides powerful automated means to study properties of individual actin filaments in their network context. Our solution is made publicly available as an open-source tool, “DeFiNe”, facilitating decomposition of any given network into individual filaments.

To decompose the graph G into individual laments it is natural to decompose it into paths, i.e., to solve a path cover problem (PCP). The PCP has been intensively studied on dierent types of graphs and with various restrictions (e.g. [95100]). There are several potential routes (cf. [96] for an overview of the PCP for testing printed circuits): (1) We may either use node-or edge-paths, where a path p = (a p,1 , . . . , a p,P ) is an ordered sequence of P = |p| pairwise adjacent nodes (a ∈ N ) or edges (a ∈ E), respectively, and a p,i denotes the i-th node or edge of lament p. (2) The paths may be either node-disjoint, edge-disjoint, or unrestricted. (3) The objective of the PCP may be either to obtain a cover of minimum cardinality or minimum weight.
For our purpose, the decomposition of a lamentous network into individual smooth laments, it seems reasonable to look for an edge-path cover where each edge is covered by S1 (at least) one path and the total (or average) roughness is minimised. Edges that are covered by more than one path naturally correspond to lament overlaps. The minimisation of the average instead of the total roughness favours shorter paths which may be appropriate for some networks.
To dene our lament cover problem (FCP) more rigorously, we introduce the roughness r p of path p and the set P of all paths in G: Given a set E of edges and a set P of paths with roughnesses r p , p ∈ P: Find a subset P fil ⊆ P with minimal total (or average) roughness R such that each element in E is covered (at least) once.
The roughness measure r p of a path p can be chosen arbitrarily and may involve, e.g., the edge weights or the edge alignments. An intuitive choice is the pairwise lament roughness of p (cf. Eq. 1), where w e p,i denotes the weight of the i-th edge in lament p. The pairwise lament roughness is the average absolute value of the dierence between weights of adjacent edges. It reects the consistency of the edge weights along a lament which is typically smaller within than across laments (but cf. Discussion). Moreover, if the path consists of a single edge we take its weight as a roughness measure. This choice increases the exibility of the obtainable lament covers and is necessary to avoid a cover by only individual edges which contribute zero weight when weighted only according the rst line in Eq. S1. Another measures for the quality of a lament is the all-to-all lament roughness (cf. Eq which is the average maximal dierence between any edge weights in a path p, and again the original weight of the edge is used for a path of length one. Taking into account that most laments are only moderately bent, we may further wish to minimise the maximal lament S2 deection angle between adjacent edges of a path p (cf. Eq. 3), r p,angle = max i∈{1,...,P −1} angle v e p,i+1,1 − v e p,i+1,0 , v e p,i,1 − v e p,i,0 where v e p,i,0 and v e p,i,1 denote the positions of the start and end nodes of the i-th edge of lament p, respectively. Moreover, angle v, v := arccos  (b) Corresponding (node-weighted) line graph with equivalent path cover and the same roughness results as for the (edge-weighted) graph in (a). (c) Extension of an arbitrary graph with node weights 1 by a node of weight E. Here, nding a (node-weighted) path cover of roughness R = (0 + 0 + · · · + (E − 1)) / (E − 1) = 1 or less is equivalent to nding a Hamiltonian path. This equivalence holds for covers minimising the total or average roughness of the cover and using the pairwise or all-to-all lament roughness measure (Eqs. S1 and S2), see table.
The FCP is dicult to solve. This is intuitively clear as the number of paths (let alone the number of path covers) increases rapidly with the number of nodes N . Even in planar S3 graphs, the number of closed paths visiting each node once was shown to increase at least exponentially with N [101,102]. We show now that the FCP is NP-hard, even for planar, cubic graphs. Planar graphs can be drawn on a plane without crossing edges. They are of particular relevance since graphs that are generated from two-dimensional image data are planar by construction [40? ]. Cubic graphs have only nodes of degree three. A proof of NP-hardness of a problem for planar, cubic graphs directly implies its NP-hardness on general graphs. The basic idea of a typical proof of computational complexity is as follows [103]: A problem of known complexity is selected. By providing a constructive transformation or reduction, a bijection between the known problem and the problem in question is established, i.e., any yes-instance of the decision-version of the known problem is mapped to a yes-instance of the decision-version of the problem of interest and analogously for the no-instances. This reduction proves that the two problems fall into the same class of computational complexity. Our proof is by reduction from the Hamiltonian path problem (HPP) on planar, cubic graphs which is known to be NP-complete [42]. The HPP asks, for a given graph, whether there is a node-path which visits each node exactly once.
First, we note that nding a lament cover on an edge-weighted graph G is equivalent to nding a node-path cover on its node-weighted line graph L (G) (Fig. S1A and B). The line graph L (G) of a graph G has a node of weight w e for each edge e in G and edges connecting two nodes if the corresponding edges share a node in G.
Second, for a given line graph L (G), we construct a graph such that nding a node-path cover of weight 1 or less is equivalent to solving the HPP. To that end, we add one edge with a terminal node to the line graph and set all original node-weights to 1 and the new node-weight to E (Fig. S1C). Then, only a Hamiltonian path ensures a minimal weight of = 1, for both pairwise and all-to-all lament roughness r p = {r p,pair , r p,all } (cf. Eqs. S1 and S2) and both minimisation of total and average lament roughness, i.e., A ∈ {0, 1}.
Finally, we show that nding a Hamiltonian path on a line graph of a planar, cubic graph is NP-complete. It was shown that the HPP is NP-complete on general line graphs via a reduction from the HPP in cubic graphs [104]. This reduction remains valid when planar, cubic graphs are used instead of cubic graphs, for which NP-completeness of HPP is known [42]. Therefore, the decision version of the FCP is NP-complete and the FCP is NP-hard, as claimed. Since the FCP is NP-hard on planar, cubic graphs, it is (at least) NP-hard on S4 general graphs.

SUPPLEMENTAL MATERIAL S3: THE FILAMENT COVER PROBLEM ON TREES IS SOLVABLE IN POLYNOMIAL TIME
While we showed that the FCP is NP-hard on general and even planar, cubic graphs, it is solvable in polynomial time on trees. The polynomial algorithm outlined here is similar to those proposed to nd an unrestricted node-path cover where each vertex may be included in multiple paths of minimum cardinality or minimum weight [99].
The basic idea is to assume that a certain path covering a certain edge is in the cover (in a Since the FCP is NP-hard even on planar, cubic graphs, we need suitable approximation algorithms. In particular, the approximation algorithms should allow overlapping laments as well as looped laments. A natural choice seems to be the formulation of the FCP as a set cover problem (SCP) [53]:

S5
Given an object set U, called universe, and a set S of sets with costs c s , s ∈ S: Find a subset S set ⊆ S with minimal total (or average) cost such that each element in U is covered (at least) once.
In our case, the universe corresponds to the set of edges of the given graph (U=E ), a set corresponds to a path (s=p), the cost of a set corresponds to the roughness of a path (c s= r p ), and the set cover corresponds to the desired lament cover (S set= P fil ). We note, that this formulation of the SCP allows overlapping sets, s∩s = ∅, s, s ∈ S, which directly translates into overlapping laments in our FCP. By requiring that each element in U is contained in S set exactly once, we may exclude lament overlaps.
An open task is then the generation of a suitable set of paths (S=P ). Since for a general graph it is not feasible to nd all paths P (cf. the motivation of the NP-hardness proof of the FCP above), we need to nd a representative subset ,P , of paths. We propose two approaches: (1) We sample paths from T = 100 random minimal spanning trees (RMST) of G. To obtain a RMST, each edge is assigned a uniformly distributed random weight and the minimum spanning tree with respect to these weights is computed. Each tree has N (N − 1) /2 non-trivial, undirected paths that we add to our set P . However, the paths in a tree cannot contain loops. (2) We perform a modied breadth-rst search (BFS) on the nodes, store the generated paths, and stop the search for a path p when it violates a straightness criterion, e.g., r p,angle < 60 • (cf. Eq. S3) which is used throughout the paper.
We add all permitted paths to P . We note that for all real-world lamentous graphs, due to lament thickness, there are spatial constraints on the number of nodes of a graph as well as on the node degrees. Moreover, for the lamentous networks considered here, the radius of curvature of a lament is typically not much smaller than the region of interest. The number of loops is further reduced by the straightness criterion which eliminates paths with a small radius of curvature. Hence, the number of loops in the network is restricted and our heuristically modied BFS allows for loops and yields a representative set P in reasonable time.

S6
The SCP may be expressed as a binary fractional linear program [105], and we analogously write the FCP as where in the rst line A ∈ {0, 1} determines whether the total or the average roughness is minimised. In the second line, equality holds for an exact cover. For A = 0, Eq. S4 is a binary linear program that may be solved using well-established and ecient algorithms [43,44].
For A = 1, the fractional problem may be rewritten as a binary linear program as well [106,107]. To that end, we introduce new variables y = p∈P x p −1 The latter expression is non-linear but may be replaced by a set of binary linear equations, Here, M is a suciently large constant that needs to exceed any y (cf. the Big M method [108]). Since y = p∈P x p −1 ≤ 1 for the cover of any non-empty graph, we choose M = 2.
Thus, there are a number of options in our FCP: The input set of paths may be obtained by using a modied BFS or from sampling RMSTs or (denoted by either BFS or RMST ).
The laments may overlap or not (over /exact ). The objective of the FCP may be the S7 minimisation of the total or the average roughness (total /avg ). The roughness of a lament may be measured by the pairwise or the all-to-all lament roughness (pair /all ). Solutions of the FCP with dierent options are compared in the Results.
An implementation of the presented approximation schemes to the FCP with the described options is supplied as an open-source tool, DeFiNe (Decomposing Filamentous Networks), under GLP3 at http://mathbiol.mpimp-golm.mpg.de/DeFiNe/. DeFiNe is programmed in Python [109] and employs the packages SciPy [110], NetworkX [111], and cvxopt [112] and PyGTK [113] for a simple and user-friendly graphical user interface. DeFiNe takes as input a weighted graph in the standard .gml le format [54] and outputs a standard .gml graph with lament identities stored as edge colours. Node coordinates may be included in the input le to enable the modied BFS that takes into account edge alignments. Furthermore, manual lament assignments may be included in the input le and the similarity with the automatically obtained lament cover is assessed as described below. In addition, DeFiNe generates a standard, human-readable .csv-table of various individual lament measures for custom analyses. The lamentous structure as well as the manual lament assignments shown in Fig. 1 are available as a .gml le under the above internet address for demonstration purposes.

SUPPLEMENTAL MATERIAL S5: EXTRACTION OF WEIGHTED NETWORKS FROM IMAGES
The procedure used to extract weighted networks from image data is similar to those proposed in [40,41]: (1) The original grey-scale image are pre-processed to enhance the lamentous structures. Here, a vesselness lter with kernel width of 2 pixels was used for simplicity [114]. (2) In the ltered image, the lamentous structures are separated from the background by applying an adaptive median threshold with a block size of 49 pixels, , and g i,· = C j=1 g i,j . The contingency tables h ×,× , ×, × ∈ {=, =}, provide the numbers of edge pairs which are in the same or dierent sets in the two partitions, respectively, and is related to g i,j as shown in [57]. All measures are restricted to the unit interval with larger values reecting higher similarity [58].
While these measures of partition similarity are widely used [46,48], they pose some difculties. The variation of information, VI, is only well-dened for disjoint partitions, which occur for non-overlapping laments. While the Jaccard index, JI, and the Rand index, RI, cover intersecting partitions they may generally yield opposing results. We demonstrate this inconsistency by investigating two types of random partitionings: First, for 100 repetitions, we randomly partitioned 2 sets of 100 numbers and up to 10 duplicates (to simulate overlapping laments) into 5 − 10 random partitions (Fig. S2a). While VI and JI were correlated ( Fig. S2b; cf. Kendall rank correlation coecient τ > 0), the other two combinations showed a strong negative correlation (cf. τ < 0). Second, to study lament covers that resemble the decomposition of real lamentous networks more closely, we constructed a relative neighbourhood graph [116,117] with 100 nodes uniformly distributed in the unit square and computed a random minimum spanning tree (Fig. S2c). For 100 repetitions of this procedure, we partitioned the resultant tree into laments by choosing a path at random and adding it to the decomposition if the total overlap of any two paths already in the decomposition is below 10 edges (cf. = for overlaps). Again, the correlation among the classical similarity measures was poor or negative ( Fig. S2d; except for the correlation between RI and JI; |τ | < 0.6). Although other measure for the similarity of intersecting partitions have been proposed [118120], we adhere to RI and JI for simplicity.

S11
More severely, however, the above similarity measures do not take into account the structure of the graph G underlying the (edge-)partitions induced by the obtained lament covers.
To date, we are only aware of structure-aware similarity measures for the comparison of partitions whose items are distributed in Euclidean space [121123]. Yet, these approaches do not take into account the explicit graph structure of the partitions. To remedy this shortcoming, we introduce a suite of measures, the structure-aware Rand and Jaccard index, RI d and JI d , respectively. To that end, the contingency tables h ×,× in Eqs. S7 and S8 are replace To investigate the performance of our extended, structure-aware partition similarity measures, RI d and JI d , we apply them to the articial graph-based random partitions described above (cf. Fig. S2c). Indeed, when considering the partition membership of neighbouring edges only, i.e., RI 1 and JI 1 , the similarity measures yield very consistent results ( Fig. S2d; cf. τ = 0.999) in contrast to the lower correlation with the classical similarity measures (cf. τ < 0.9). Investigating the dependency of RI d and JI d on the distance d for the tree lament covers shown in Fig. S2d, we nd that RI and JI ( Fig. S2e; dotted blue and yellow) over-and underestimate the partition similarity with respect to RI 1 and JI 1 (Fig. S2e; solid S12 blue and yellow). Furthermore, we nd that RI d is non-monotonic in d ( Fig. S2e; cf. the black triangle). These errors in estimation are explained by the large fraction of false negatives (h =, = ) and the small fraction of true positives (h =,= ), respectively, which dominate for large distances d, i.e., the limit in which the graph structure is ignored ( Fig. S2f; dotted black and solid black). Due to the dierential increase/decrease of h =, = /h =,= , their combination and therefore RI 1 is non-monotonic ( Fig. S2f; dashed blue). Finally, we observe opposing results of the classical partition similarity measures also for the lament covers of articial and biological lamentous networks investigated in the main text, while our extended, structure-aware measures RI 1 and JI 1 provide consistent similarity results (Fig. S2g).

SUPPLEMENTAL MATERIAL S7: ROBUSTNESS OF FILAMENT COVERS AGAINST INCOMPLETE KNOWLEDGE OF UNDERLYING NETWORK STRUCTURE AND IMAGE NOISE
S13 Figure S3. Analyses of robustness of lament covers against incomplete knowledge of network and image noise. A cytoskeletal and a contrived network are decomposed automatically by solving the FCP with options given in Fig. 3c and Fig. 1c Our approach enables accurate decomposition of a given lamentous network into its constitutive laments (cf. Results). However, the preceding extraction of the network from image data is often non-trivial (cf. Methods). Therefore, to assess the robustness of our approach, we test how the accuracy of our lament decomposition is aected (1) by incomplete S14 knowledge of the true underlying network structure and (2) by image noise which aects the edge weights of the extracted network. We perform these analyses for the actin cytoskeleton shown in Fig. 3 (Fig. S3a, left panel) and the contrived network shown in Fig. 1 (Fig. S3e).
(1) First, we start from the original, weighted network and randomly remove one of the E edges to model erroneous segmentation. For the disrupted network, we recompute the optimal lament cover (with the same options as in Figs. 1c and 3c, respectively) and calculate its agreement with the original manual segmentation (measured by the structureaware Jaccard index JI 1 ; the removed edge is assigned a dummy label). We repeat the procedure for E networks from which a single, randomly chosen edge has been removed.
Next, we repeat the procedure for E networks from which a randomly chosen double of edges has been removed. We then proceed with triplets, quartets, and so on up to subsets of 50 randomly chosen edges.
As expected, the removal of increasing numbers of edges typically decreases the agreement of the automated lament cover with the manual assignment for the cytoskeletal as well as the contrived network ( Fig. S3b and f ). For both networks, however, the decrease is slow and JI 1 increases only by around 0.002 per removed edge (cf. Fig. S3b and f, solid grey line indicates linear t). Interestingly, for the actin cytoskeleton, the removal of certain edges may even increase the accuracy of the lament cover (Fig. S3a, right panels show manual lament assignment and automated lament cover the original network, and an exemplary lament cover obtained after the removal of two edges, coloured white here, which improves the agreement with the manual assignment; cf. Fig. S3b, dotted grey line and triangle).
(2) Second, we simulate image noise by adding centred Gaussian noise ∆w to the edge weights of the original network with (S13) We normalise the standard deviation of the added noise by the original edge weights to avoid extreme uctuations, and f is referred to as noise factor. For each noise factor, we construct 100 networks, recompute the optimal lament covers, and measure their agreement with the manual lament assignment, as in the rst scenario above.
For both the contrived and the cytoskeletal network, the accuracy of the lament cover decreases with increasing noise, as expected ( Fig. S3d and g). However, this decrease in S15 accuracy is slow and JI 1 decreases by less than 0.001 when increasing the standard deviation of the noise by 1% of the original edge weights, i.e., when increasing the noise factor by one (cf. Fig. S3d and g, solid grey lines indicate linear ts). We note that with increasing edge noise the accuracy of the lament cover approaches a constant, non-zero JI 1 which reects that some information about the lament structure maybe obtained from the topology of the network alone, irrespective of the edge weights (cf. Fig. S3d and g,  To further strengthen our statistical analyses of cytoskeletal actin laments (cf. Fig. 3), we investigate a complete movie of a plant cytoskeleton of 100 frames over 200 s (cf. Methods for details). For each frame, we extract a weighted network representation of the cytoskeleton as described above (cf. Methods for details) and solve the FCP with options described in Fig. 3, i.e., we solve the exact FCP (exact ) for paths from a modied breadth-rst search (BFS ) and by minimising the total (total ) pairwise lament roughness (pair ). Analysis of various properties of the automatically obtained laments conrms our ndings in Fig. 3: The laments show a preferential alignment parallel to the cell axis throughout the movie (Fig. S4a). The distribution of lament lengths, pooled across the duration of the movie, conrms the reported gamma distribution ( Fig. S4b; maximal likelihood ts of normal, Weibull, and Rayleigh distributions yield higher values for the Akaike information criterion [60]). Filament length is correlated with lament weight, i.e., longer laments are typically thicker ( Fig. S4c; Pearson correlation p-value p P < 0.05). Moreover, the correlation between dierent measures of lament curvedness, i.e., the lament bending and the maximal lament S17 angle, are consistently negatively correlated with the lament length ( Fig. S4d; p P < 0.05).
In addition to these previously analysed features of lamental organisation, we study the course of dierent lament properties over time: The average lament weight shows large uctuations and is non-stationary over the recording period ( Fig. S4e; cf. augmented Dickey-Fuller test p-value p ADF ≥ 0.05). This non-stationarity suggests substantial changes in the prevalence of ne actin lament and thick bundles, respectively, and prompts further investigations. However, we found that the average lament length as well as the average lament bending remain stationary over the course of 200 s (Fig. S4e; cf. p ADF,length < 0.05 and p ADF,conv < 0.05). Since the length distribution of laments tunes the mechanical properties of lamentous networks [124,125], this stationarity of the average lament length may be of immediate biological relevance. The stationarity of the average lament bending may be a direct consequence of the roughly constant lament length distribution (cf.  S5, 1st column), we pre-process the images to obtain a binary representation of the lament centre lines (Fig. S5, 2nd column), and extract a weighted network representation as described in the Methods (Fig. S5, 3rd column). For the contrived and biological and the cosmic networks, we manually assign lament identities and compute the connected components, respectively (Fig. S5, 4th column). Finally, we decompose the networks into laments by solving the FCP with dierent options (Fig. S5, 5th column). For the rst contrived network, we allow overlapping laments (Fig. S5a) while for the second, grid-like contrived network (Fig. S5b), the neural networks ( Fig. S5c and d), the cytoskeletal networks S18 ( Fig. S5e and f ), and the cosmic webs ( Fig. S5g and h), we obtain exact lament covers with options described in Fig. 1e. The agreement of manual assignments and automated lament decompositions of the studied networks is measured by the classical and the structure-aware Jaccard indices JI and JI 1 and shows good agreement (JI 1 close to 1, and cf. discussion of (g) Cosmic web of galaxies (cf. Fig. 4). (h) The dense web of simulated galaxies consists of many connected components that are further decomposed into laments (cf. Fig. 4  we apply SOAX to the pre-processed and segmented image data (cf. Methods and Fig. S6a, second panel) to which we further apply a Gaussian lter of unit standard deviation to obtain smooth intensity gradients required by the algorithm. SOAX is run using the default parameters and the resulting lament identities are manually assigned to match those of the manual solution (Fig. S6b). To quantify the quality of the decomposition, we manually assign lament identities in our original network representation (cf. Fig. S6a, third panel) according to the open contour-based result (cf. Fig. S6b) and compare the result to the manual assignment (cf. Fig. S6a, fourth panel). The structure-aware Jaccard index JI 1 = 0.938 is close to 1 and indicates good agreement between open-contour based decomposition and manual lament assignment. We note that some junctions/nodes obtained from SOAX are split in two in comparison to our extracted networks (cf. intersecting ).
Moreover severely, some laments are over-segmented and thus fragmented (cf. ⊕), especially overlapping laments which are not captured in the open contour-based approach (cf. ). To remedy this shortcoming, we apply our lament cover-based approach to postprocess the open contour-based decomposition and merge over-segmented lament fragments. To this end, we convert the open contour-based lament representation into a weighted network, where edge weights represent average lament segment intensities as before (cf. Methods and Fig. S6c). As before, a collection of paths P is sampled using a breadth-rst search (BFS ) and their pairwise roughness values r p , p ∈ P , are computed according to Eq. S1 (pair ). Then, to take into account the initial open contour-based lament decomposition F as a starting point, in which certain edges have already been assigned to the a given lament, we modify the roughness values of the sampled paths: For each initial lament or fragment that is fully contained within a sampled path, the roughness of that path is decreased by a large value, R filament = 10 4 , which is larger than any r p to favour the inclusion of these laments or fragments in the optimal lament cover. Since the subtraction of R filament yields negative roughness values which would lead to the inclusion of all these paths, we add another, even larger constant R offset = 10 8 > R filament to all roughness values, S22 i.e., (S14) For these modied roughness values r p , we solve the FCP by minimising the total roughness (total ) and allowing for overlaps (over ; Fig. S6d). The resulting post-processed lament decomposition merges several lament fragments which were over-segmented by the opencontour based approach and shows very good agreement of JI = 0.776 and JI 1 = 1.000 with the manual lament assignment. Interestingly, in this decomposition, parts of two laments are interchanged (cf. ⊕) as in Fig. 1f for dierent FCP options. In conclusion, for any approach that detects laments from image data and yields a weighted network representation, our lament cover-based approach may provide a helpful means to further post-process and enhance the accuracy of the obtained lament decomposition.
Open contour-based lament detection and lament cover-based postprocessing. (a) Dierent stages of our lament cover problem (FCP)-based analysis for a contrived lament structure, from original image to segmented lament centre lines and weighted network representation, manual lament assignment and automated solution (cf. Fig. S5a for further explanations). (b) Filaments and junctions (cf. circles) identied from the segmented lament centre line image using SOAX, an stretching open active contour-based approach [23]. Colour-coded lament identities were manually assigned to match those of the manual solution in (a) and excess lament fragments were coloured black. While the agreement with the manual solution is good (JI 1 close to 1), some laments are over-segmented (cf. ⊕) and thus fragmented, especially at locations of lament overlaps (cf. ). (c) Weighted network representation of the contrived lamentous structure obtained from SOAX. (d) Using the lament assignments from SOAX in (b) as a starting point, our lament cover-based approach is used to post-process the lament decomposition, which merges broken lament fragments and improves the agreement with the manual solution (JI 1 = 1).