Homological Scaffold via Minimal Homology Bases

The homological scaffold leverages persistent homology to construct a topologically sound summary of a weighted network. However, its crucial dependency on the choice of representative cycles hinders the ability to trace back global features onto individual network components, unless one provides a principled way to make such a choice. In this paper, we apply recent advances in the computation of minimal homology bases to introduce a quasi-canonical version of the scaffold, called minimal, and employ it to analyze data both real and in silico. At the same time, we verify that, statistically, the standard scaffold is a good proxy of the minimal one for sufficiently complex networks.

In [40], the generators of persistent homology are used to build one instance of network skeletonization called homological scaffold. However, the method has a serious drawback, consisting in the large degree of arbitrariness in the choice of one representative cycle from the many equivalent generating cycles of the same homology class. This is unfortunately a direct consequence of the homology classes being equivalence classes and affects all attempts to localize cycles ( [58,42]). In this work, we set out to address this issue by searching for a form of canonicity in the choice of generators, namely by computing minimal representatives of homology bases.
Minimal homology bases have long been investigated ( [59,60]), with a breakthrough only coming thanks to the introduction of a first efficient algorithm for the computation of bases in dimension one ( [61]). Here, we leverage said minimal bases to propose a new approach to network skeletonization, the minimal scaffold, which overcomes the limitation of the previous one. While the minimal scaffold is not unique in the most general case possible, we provide strong guarantees and caveats on when and to what degree it is well-defined. We then show a few applications of the novel method, concluding the paper with a comparison between our and the previous construction.

Outline of the Paper
The paper is organized as follows. Section 2 provides a brief overview of the main concepts in Topological Data Analysis. Section 3 describes the original approach to network skeletonization by means of persistent homology, and highlights the deficiencies which we wish to address. In Section 4, the topic of computing minimal representatives of a homology basis is worked out. Section 5 introduces the main concept of this work, the minimal scaffold. In Section 6, the issue of uniqueness is discussed, with some results stated, leading to a more refined version of the minimal scaffold. Section 7 showcases some applications for the minimal scaffold.
In the light of its computational complexity, we further carry out in Section 8 a statistical comparison between the minimal and original scaffolds, providing some heuristic guarantees and caveats. Section 9 concludes the discussion.

Glossary
List of symbols and their common usage throughout the paper.

Symbol Meaning
A filtration of simplicial complexes (K ε ) ε=1,..M W A non-negatively weighted finite graph V The set of vertices of a graph E The set of edges of a graph VR(W ) The Vietoris-Rips complex of graph W C k (K) The vector space over Z 2 of chains of k-simplices of the complex K ∂ k The boundary operator between C k (K) and C k−1 (K) The 1 st homology group of complex K β 1 (K) The dimension of H 1 (K) P H 1 (F) The 1-dimensional persistent homology of filtration F µ A function assigning non-negative weights to edges and cycles B A minimal homology cycle basis B A minimal homology cycle basis with draws B * The disjoint union of minimal cycle bases across a filtratioñ B * The disjoint union of minimal cycle bases with draws across a filtration V i A set of homologous, equally minimal variants of a basis cycle H(W ) The homological scaffold of weighted graph W H min (W ) The minimal homological scaffold of weighted graph W H min (W ) The minimal homological scaffold with draws of weighted graph W

Background
In this section we introduce the minimum amount of mathematics necessary to the understanding of the rest of the paper. We refer to classical textbooks on the subject for further reading ( [18,19,53,16]).

Simplicial complexes
Thanks to their proven flexibility in a plethora of applicative contexts, simplicial complexes are the most adopted mathematical structure for encoding unorganized, large-size and high-dimensional data. In purely combinatorial terms, a (finite) simplicial complex K on a finite set V is a collection of non-empty subsets of V , called simplices, with the property of being closed under inclusion, i.e., every non-empty subset of a simplex of K is itself a simplex of K. Given a simplicial complex K, the elements of V are called vertices of K and a simplex σ ∈ K is called a k-simplex (equivalently, a simplex of dimension k) if it consists of k + 1 vertices. The dimension of a simplicial complex K is the largest dimension of the simplices in K. Even if the abstract definition of a simplicial complex just given is able to capture a variety of datasets not necessarily endowed with a geometrical realization, it is worth to be mentioned that, intuitively, a simplicial complex is nothing but a collection of well-glued bricks, its simplices. According with such a perspective, a k-simplex can be seen as the convex hull of k + 1 geometrically independent points. For instance, a 1-simplex is an edge, a 2-simplex is a triangle, a 3-simplex is a tetrahedron, and so on.

Homology
Homology is a topological tool which provides invariants for shape description and characterization. Given a simplicial complex K, it is possible to associate to it a collection of vector spaces C k (K) over a field, in our case Z 2 , whose bases are indexed by the k-simplices so that, loosely speaking, we say that these spaces are generated by the k-simplices of K. These spaces are connected by boundary operators ∂ k : C k (K) → C k−1 (K) mapping each k-simplex σ in the sum of the (k − 1)-simplices of K strictly contained in σ. We denote as Z k (K) := ker ∂ k the space of the k-cycles of K and as B k (K) := Im ∂ k+1 the space of the k-boundaries of K. Then, since ∂ k ∂ k+1 = 0, the quotient defines a vector space called k th homology group of K. We will call two k-cycles homologous if they belong to the same homology class. Roughly speaking, homology reveals the presence of "holes" in a shape. A non-null element of H k (K) is an equivalence class of cycles that are not the boundary of any collection of (k + 1)-simplices of K. Such classes represent, in dimension 0, the connected components of complex K, in dimension 1, its tunnels and its loops, in dimension 2, the shells surrounding voids or cavities, and so on.

Persistent homology
An intrinstic limitation of homology concerns the need for working with a single simplicial complex representing the dataset under investigation. However, in real world applications, the presence of noise and of measurement errors makes the choice and construction of a single steady representation very hard in practice. Persistent homology ( [22,53]), currently one of the main tools in Topological Data Analysis, aims at solving this issue through a multi-scale study of a dataset and of its homological features by associating to it a sequence of simplicial complexes. The concept of filtration captures exactly the idea of analyzing a dataset at different thresholds of a parameter on which it depends. More formally, given a simplicial complex K, a filtration F of K is a sequence of its subcomplexes such that Given a filtration of a simplicial complex K, persistent homology keeps track of the evolution of the non-null non-homologous cycles of K and, associating a lifespan to each of them, is able to discriminate the relevant information from the noise. Formally, for p, q = 1, . . . , M with p < q, H p,q k (F) on (p, q) of a filtration F consists of the image of the linear map between H k (K p ) and H k (K q ) induced by the inclusion of complexes between K p and K q . So, more intuitively, the elements in H p,q k (F) represent the cycles of K which survive from step p to step q. Given a filtration of finite simplicial complexes F, we define its k-dimensional persistent homology classes as the homology classes of ε H k (K ε ) modulo the maps induced by the inclusion of simplicial complexes. More properly, h 1 ∈ H k (K p ) and h 2 ∈ H k (K q ) with p ≤ q are equivalent if and only if ι * p,q k (h 1 ) = h 2 where ι * p,q k denotes the linear map between H k (K p ) and H k (K q ) induced by the inclusion of complexes between K p and K q . We call k-dimensional persistent homology P H k (F) the space spanned by the k-dimensional persistent homology classes. As proven in [23], a basis of P H k (F) is in bijective correspondence with a finite set of intervals of the form {(p, q), p < q, p, q ∈ Z ∪ ∞} referred as persistence pairs. We define a set of k-dimensional generator cycles of the persistent homology as a set of k-cycles of K M whose persistent homology classes form a basis of P H k (F). The information about the "life" of each homology class can be collected in a visual, informative representation of the topological structure of the input, the persistence barcode: a plot consisting of a bar for each homological feature appearing throughout the filtration, stretching from its birth to its death value. An equivalent way to depict the same information is through the persistence diagram: the persistence diagram is the multi-set (i.e., multiple instances of the same element are allowed) of points in R 2 consisting of all the (birth, death) pairs, i.e., pairs of values p < q such that a k-dimensional homology class arises at filtration step p and becomes zero at step q. Persistent homology owes its popularity as a descriptor to the immediacy and power of these visual representations of the homological information but, even more, to the fact that the retrieved features are provably stable. In fact, by defining a notion of distance among persistence diagrams or barcodes, it can be shown that similar datasets necessarily have similar homological features ( [24]).

Building (filtered) complexes
In many applications, one is not directly called to deal with a simplicial complex, but has instead access to data in the form of point clouds in a metric space or of weighted graphs. For example, data may be obtained as a sample of some (unknown) ground truth, i.e., an undisclosed manifold of dimension usually much lower than the space it is embedded in ( [16]). Another typical subject of application is network science ( [49,52]): in this setting, the input is in the form of a weighted graph. Notice that in this case it is not mandatory that the graph can be embedded in some metric space, i.e., that the edge weighting respects a triangular inequality. Networks are not necessarily representations of geometrical entities, and still the topological approach extends naturally to this context. In both these cases, one needs to provide a suitable simplicial complex resting on the given structure. The subject has been addressed extensively (see, for example, [53]); in here, we simply review the most typical scheme, called the Vietoris-Rips complex. Given a graph G = (V, E), its flag or clique complex is the simplicial complex F lag(G) whose simplices coincide with the cliques of G. Given a point cloud V ⊂ R n and fixed a value ε > 0, one can build a graph G ε with a vertex for every point in V , and an edge between two vertices every time the distance between the corresponding points is less or equal than ε. Analogously, given a weighted graph G = (V, E) one can build a subgraph G ε on the same vertex set, with only those edges that have weight less or equal than ε. Independently from the considered case, one can define the Vietoris-Rips complex V R ε of parameter ε as the flag complex F lag(G ε ) of graph G ε . Furthermore, since varying ε the Vietoris-Rips complexes V R ε form an increasing sequence of simplicial complexes, the family (V R ε ) gives raise to a filtration denoted as filtered Vietoris-Rips complex (see

1).
As already mentioned, Vietoris-Rips complexes are employed in a wide variety of different application domains. The reason is that their definition only depends on the pairwise distances between points, making them efficient to compute and to store with respect to more refined alternatives. It is worth noticing, however, that cost of this simplicity is the fact that the dimension of a Vietoris-Rips complex can explode even when the points are sampled from a low-dimensional subspace of R n .

Homological Scaffold
The homological scaffold originated from the intuition that traditional, graphtheoretical tools in network analysis were naturally able to capture significant properties ( [62]), but proved not as effective in detecting multi-agent and large-scale interactions. Interest in searching for alternative descriptors of network relations arose, and soon works were published which leveraged invariants offered by computational topology ( [63,14,13]).
In proposing the scaffold ( [40]), the authors pointed out that homological might be able to summarize well network mesoscale structures, i.e., features living between the purely local connections and the global statistics, to which previous methodologies were blind. Furthermore, this structure could be analyzed over the continuous, full range of interaction intensities, without the need for ad-hoc domain-specific thresholds.
Homological cycles intuitively describe obstruction patterns. The presence of nontrivial homology within a given region of a network highlights its structure as non-contractible, binding signals to flow over constrained channels, which in turn play the role of bridges.
To test the method, the homological scaffold was computed from resting-state fMRI data for 15 healthy volunteers who were either infused with placebo or psilocybin: the scaffold discriminated the two groups, as well as providing meaningful insight as to the impact of the psychoactive substance onto the pattern of information flow in the brain [40].
Given a non-negatively weighted finite graph W = (V, E, w : E → R + ), let F be a filtration of simplicial complexes as above.
Let {b i } be a set of 1-dimensional generator cycles of the persistent homology. Since we are over Z 2 , each of the b i 's is completely identified by its support, which is a set of edges of E. In particular, we can depict set {b i } as a matrix whose row are indexed by E and having the b i 's as columns. The row sums, as natural numbers, form a new weighting function on the edges of W , the new weights counting precisely in how many persistent cycles an edge appears along the filtration.
where by 1 e∈bi we denote the indicator function E → R + such that 1 e∈bi (e ) = 1 if e appears in b i , and 0 otherwise. Then the homological scaffold of W is the weighted graph H(W ) such that -its vertex set coincides with the vertex set of W -its edge set is a subset of the edge set of W , consisting of edges with nonzero value for h W -its weight function is the restriction of h W to E.
In accordance with the above definition, building the homological scaffold of a weighted network W is a method of network compression or skeletonization. The definition also implies that edge weights are assigned by the number of basis cycles the edge belongs to. In the example of Fig. 2(a), a filtration of simplicial complexes arising from a point cloud is depicted, together with generators of the persistent homology group, each at the scale at which it is born. In Fig. 2(b), the corresponding homological scaffold is represented: one can see that the scaffolding procedure amounts to stacking generators of P H 1 , i.e., cycles in the network, each yielding unitary weight. In the following, we shall sometimes refer to the homological scaffold as the loose, or original scaffold, to contrast it with the new definition of scaffold to follow.
As anticipated in the introduction, it is apparent that there is a substantial source of arbitrariness in this definition. Several different representative cycles exist which form a basis of the persistent homology (as a consequence of several different cycles belonging to the same homology class), and hence one must make a choice. For example, Fig. 3(a) depicts one specific cycle whose homology class generates (part of) the persistent homology group of the point cloud. At the same time, any other choice of edges forming a cycle around the hole is homologically equivalent and, in principle, legitimate.
In the original paper, the authors resorted to using the cycles as output by the JavaPlex implementation ( [64]) of the persistent homology algorithm (based on the original implementation of [21]), and a posteriori checked the selected cycles for consistency. However, in principle, this means that the same simplicial complex written with two different orderings of the simplices could lead to different choices of generators, and therefore, to different scaffolds. As such, we must be careful in the choice of nodes and edges output by the algorithm; while the presence of a generator denotes undeniably that an obstruction pattern exists, we cannot be as confident about its precise location in the network or the constituents that provide bridges around it. The homological scaffold defined in this way introduces noise in the localization of mesoscale patterns onto individual nodes and edges, a process which, if accurate, could provide valuable insight as to the functional role of single players in a network.
In this work, we try to work around the problem of cycle choice and give a stricter definition, by requiring that, among all possible representatives, those of minimal total length are chosen (e.g., Fig. 3(b)). The original algorithm reported a computational complexity of the order O(n 3 ) to obtain representatives of basis cycles.

Minimal Bases
The search for minimality in the computation of the scaffold was made feasible by the introduction of efficient algorithms to compute the minimal representatives of a homology bases in dimension one. It is known that in dimension higher than one, minimal representatives of a homology basis will remain elusive. Indeed, Chen and Freedman ( [65]) proved that the problem of obtaining these minimal representatives is computationally intractable, being at least as hard as the notoriously NP-Hard Nearest Codeword Problem. Furthermore, it is even NP-Hard to approximate within any constant factor, meaning that no polynomial-time algorithm exists to obtain an approximate minimal basis that differs from the exact one by at most a multiplicative constant. In the light of this, we must necessarily restrict our attention to the 1-dimensional case, i.e., computing minimal representatives of a basis of H 1 .

Minimal Bases and Dey's Algorithm
Given a simplicial complex K, let us consider C 1 the vector space generated by the 1-simplices of K and Z 1 the vector space of 1-cycles, i.e., Z 1 = ker ∂ 1 . Given a 1-cycle b ∈ Z 1 , let µ(b) be its length, i.e., the sum of the weights of the 1-simplices that form it, and denote by [b] the homology class b belongs to. Finally, let β 1 := dim H 1 (K). We want to obtain a set of β 1 1-cycles ∈ Z 1 that is a set of cycles of minimal length whose homology classes span H 1 (K). In accordance with the literature, we call this set a minimal homology basis, with a slight abuse of terminology, as it would be more appropriate to call it a minimallyrepresented homology basis.

Algorithm: MinBasis(K)
• A basis of the cycles group Z 1 is found via a spanning tree. Each edge in the complement of the spanning tree identifies a candidate cycle ( [66]). • An annotation of the edges is computed via matrix reduction ( [69]). This yields the dimension β 1 of H 1 , as well as an efficient tool to determine if two cycles b 1 and b 2 are linearly dependent in . • A set of support vectors is generated which maintains a basis of the orthogonal complement in H 1 of the minimal basis cycles. • Iteratively for each dimension of H 1 , the candidate set of cycles is parsed in search of cycles b's that are linearly independent in homology from the previous ones (exploiting the support vectors). Among these, the µ-shortest one is added to the minimal basis. • The set of support vectors is updated for the remaining dimensions to enforce it remain a basis of the orthogonal complement of the basis. • The last two steps above are repeated until completion of the minimal basis.
Notice that the minimal homology basis is guaranteed to exist, as we only work with finite simplicial complexes, which imply the existence of a finite number of bases. However, it needs not, in general, be unique. Several different cycles of the same minimal length may all belong to the same homology class of a basis cycle. Heuristically, this is especially true in case the input complex is unweighted (equivalently, has equal weights for every edge), in which case the length of a cycle is the number of edges that form it. Furthermore, there exist cases when different sets of cycles of minimal length generate the same homology space, and are not even pairwise homologous. We will treat the problem of the uniqueness of the minimal basis in more detail in the following, and account for it explicitly in the construction of the minimal scaffold.
The computational complexity of the above procedure is evaluated ( [61]) to O(n 2 β 1 + n ω ) where n is the number of simplices in K and ω is the fast matrix multiplication exponent, which as of 2014 is bounded by 2.37 ([61, 70, 71]). This yields a worst-case complexity of O(n 3 ) in the number of simplices for general complexes, which we recall is itself of order 3 in the number of points in the worst case.

Minimal Scaffold
In this section, we introduce an alternative definition for the homological scaffold, which we call minimal, based on the minimal representatives obtained above, and aims at overcoming the arbitrariness in the cycle choice of the previous definition. After addressing the simplest case, we analyze its uniqueness properties and introduce a second, more refined, definition.
Let F be the filtration of simplicial complexes induced by a non-negatively weighted finite graph W . For all filtration steps ε, define, as per (2) Then, we define the minimal scaffold of W as the weighted graph H min (W ) whose: -vertex set coincides with the vertex set of W -edge set is a subset of the edge set of W , consisting of edges with nonzero value for h W,min -weight function is the restriction of h W,min to E.
The minimal scaffold amounts, again, to the stacking of generator cycles across a filtration. However, two differences are to be noted with respect to the loose definition. First, we require the representative cycles to be minimal. Second, we point out that while the loose scaffold is built by aggregating the generator cycles of P H 1 (F), the minimal scaffold is built by independently computing a minimal basis for each H 1 (K ε ), for all ε. Notice that, since cycles are modified throughout a filtration, it would be meaningless to talk about a minimal representative over a certain persistence interval. This also means that its computation can be effectively parallelized by assigning different filtration steps to different jobs, and later recombining the outputs. An interesting phenomenon that descends directly from the above peculiarity is that the minimal scaffold of random point clouds tends to display a more pronounced triangular structure (clustering) around cycles. Indeed, as longer (or, in non-metrical filtrations, later) edges are introduced, a cycle can be shortened (by the triangular inequality) by a longer edge which cuts a corner. Since at each step the algorithm records the minimal representative, upon aggregating the minimal scaffold one finds each cycle in its progressively shorter version, and the history of the shortening is visible as a padding of triangles around it (see for example Fig.  4(a)).
We remark that, if there is no ambiguity in the construction of a filtration of simplicial complexes from a point cloud, or from a weighted graph, we will indifferently speak of the scaffold as a function of either of them (H min (C), or H min (W ), or H min (F)).
We have mentioned that the scaffold amounts to a change in weighting in the input graph This yields a theoretical worst-case complexity of order O(n 9 n 2 ) = O(n 11 ). Therefore, while the minimal scaffold is undeniably a polynomial-time algorithm, its practical computation is often hindered by its dire lack of scalability, especially if compared against the loose version, which has a far more favourable complexity. A comparison of running times is carried out in Fig. 5, which clearly shows that computing the minimal scaffold on an ordinary machine can quickly become troublesome.

Figure 5
The running times of computing the minimal and loose scaffolds for Watts-Strogatz weighted random graphs. For all instances, number of nodes N is indicated on the x-axis. Number of stubs k is N/2, and rewiring probability is p = 0.025.

Implementation
We have written a Python implementation of Dey's algorithm, together with a library for the computation of the minimal scaffold. The code is available on GitHub at [72], with some usage examples. It allows for shared-memory multi-threaded parallelism across filtration steps to improve computation times, while still being suitable for ordinary desktop workstations.

Uniqueness of the minimal scaffold
The uniqueness of the minimal scaffold depends on the uniqueness of the minimal basis. Indeed, if there exists only one possible set B * of cycles forming a minimal basis, then the scaffold is uniquely determined. Two issues affect the uniqueness of set B * .

Draws
The first one arises when two or more different and homologous basis cycles are of the same minimal length. This case is relatively simple to work around: we modify the definition of minimal scaffold to keep track of all variants of minimal basis cycles, dividing the weight equally among them. Specifically, to account for this issue we have slightly modified Dey's algorithm. In its last step described above, one is concerned with finding all cycles whose annotation is not orthogonal to the given support vector: among these, the one with minimal length is chosen as a basis cycle. Instead, we keep track of all such cycles with the same minimal length. This does not alter the complexity, as one needs to check all possible cycles anyway. We call this case a draw.
Therefore, we modify set B to become a set of sets of cycles. Given complex K, we define a minimal basis with draws The meaning of the above definition is that all variants of all minimal basis cycles are taken into account when building the scaffold, and the weights are assigned dividing each variant's contribution by its cardinality, for each filtration step. In the example of Fig. 6(c), the two cycles forming the variant of the only generator are multiplied by a factor of 1 2 and then summed: therefore, common edges outside the diamond are assigned weight 1, consistently with the minimal scaffold in definition (3), whereas the four edges forming the perimeter of the diamond each get assigned weight 1 2 .
With the introduction of draws, we settle the case when ambiguity arises among individual cycles, without interactions. As an example, we can state the following result.
Proposition If F is such that, for all ε in the filtration, each basis cycle belongs to a different connected component of K ε , then the minimal scaffold with draws H min (F) is unique.

Pathological cases
The other issue arises when there exist sets of minimal cycles that are not linearly independent. Suppose that three different cycles generate a homology group of dimension two, i.e., when three minimal cycles are pairwise independent in homology, but threewise dependent. In this case, two generators are sufficient to span H 1 and, if their lengths are arranged pathologically, there is no principled way to choose two out of the three. Suppose for example that three cycles b 1 , b 2 and b 3 are such that In this case, both bases {b 1 , b 2 } and {b 1 , b 3 } span the same homology space, and are of equal minimal length. The minimality criterion fails in this case. One could believe that such a configuration can only happen in the most general spaces, and that by imposing some mild hypotheses on the input data one could rule the pathology out. In fact the opposite is true, this degeneracy being possible even after enforcing very strong conditions on the data. Counterexample Even if W is planar and an isometric embedding W → R 2 exists (i.e., the input planar weighted graph can be accurately drawn onto the plane), the minimal scaffoldH min (W ) needs not be unique.
In fact, consider complex K arising from the geometric, planar graph in Fig. 6(d).
Its homology H 1 (K) is generated by two cycles; since the outer cycle b 1 is the shortest, and the two inner ones b 2 and b 3 are of equal length, the minimality criterion can not solve between {b 1 , b 2 } and {b 1 , b 3 }, as both are acceptable minimal bases. The minimal scaffold (with or without draws) is not unique in this case.
Clearly, the same could happen with more than three cycles, with a larger number of possibly ambiguous configuration. Therefore, if we allow for a high degree of symmetry in the input, this pathology could arise even in the rather tame context of planar graphs on R 2 . This issue is rather delicate, in the sense that not only the algorithm is unable to make a principled choice; it is not even capable of detecting when such a configuration takes place. In fact, this is more of a feature of homology than a flaw in the skeletonization framework: what our eyes see as different cycles are in fact homologically equivalent, and it is impossible to use homology to tell them apart.
We however remark that, for complexes arising from real-world data, this type of configuration is actually pathological. Indeed, the following generality result holds . Then, almost surely, the minimal scaffold H min (W ) (with or without draws) is unique.
If the input point cloud is sampled uniformly at random in some R d , then edge lengths are distributed according to an absolutely continuous probability law. Therefore, given two edges e 1 and e 2 , P[µ(e 1 ) = µ(e 2 )] = 0. The same holds for any two non-identical cycles, and any two homology bases (being but finite sets of edges): the probability of them sharing the exact same length is zero. By finiteness of the input, at least one minimal homology basis exists and, by the above reasoning, almost surely this basis is unique for each filtration step. Then, with probability 1 the minimal scaffold is unique. This result is actually quite general: whenever we can assume our input data to be subject to noise, then we are in principle allowed to rule out pathological samelength cycles. In these cases, the minimal scaffold is unique.
We remark that this uniqueness result is compatible with the phenomenon of the concentration of measure: while for a very high-dimensional space or a very large number of points we know from theory that the distribution of length of edges concentrates towards its mean value, the probability of two edges (and hence two cycles) having the same length is still zero. One needs to be careful, however, that the probability of two cycles differing in length by less than some > 0 could grow very rapidly with . In summary, the minimal scaffold with drawsH min is well-defined up to some pathological circumstances, where it may depend on the ordering of the input.

Results
As illustrative examples, we show here a few applications of the minimal scaffold. Through it, we obtain meaningful subsets of known networks in neuroscience, and rank their constituents by their "topological importance". The C. Elegans dataset is a correlation network of neural activations of the nematode worm Caenorhabditis Elegans. C. Elegans has become a model organism due to the unique characteristic of each individual sharing the exact same nervous system structure.
The input consists of a symmetric weighted adjacency matrix over 297 nodes, each representing a neuron. Edge weights represent (quantized) time correlations between the firing of neurons, ranging from 1 to 70. The minimal homological scaffold of its brain map highlights the geometry of the obstruction patterns, i.e., the precise areas where nervous stimuli are less likely to flow. We stress the improvement obtained by the minimal scaffold over the loose one, in that it is not only able to identify the presence of a "grey area" in the network, but it can as well provide a reliable boundary for it, and identify which neurons and inter-neuron links are responsible for information flowing around the obstruction.
As an interesting example, we see in Fig. 7 the top 25 neurons ranked in descending order of relative node strength (sum of weights of incident edges) with respect to the average node strength. We can identify four nodes, labeled 81, 260, 36, and 37, which hold a significantly higher relative strength than the rest. This implies their presence in many minimal cycles across several scales, hence suggesting that they play a crucial role in the fabric of information flow within the nematode's brain.
The same type of analysis was repeated on the correlation network of brain activities in an 88-parcel atlas of the human brain, obtained through fMRI imaging at resting state. The data is courtesy of the Human Connectome Project ( [73]). Again, the minimal scaffold identifies which regions and links in the human brain are key bridges for the flow of information. Two parcels stand out ( Fig. 8(a)) as particularly relevant for network topology.
For a relatively small network such as this, we can visualize the scaffold as a proper subnetwork by a chord diagram ( Fig. 8(b)), with edge weight represented by color intensity and node strength by the size and color of the vertex. We stress that, starting from a virtually complete graph over 88 nodes, we reduce the size from 3828 edges to just 191, while preserving the topological structure. We can, as well, leverage libraries in computational neuroscience ( [74]) to embed the scaffold in the actual human brain, with regions correctly located, projected on the three coordinated planes. In Fig. 8(c), for visualization purposes color intensities represent log-weight in the scaffold.

Comparison of Scaffolds
As the last contribution for this work, we consider a comparison between the minimal and loose scaffolds.
We have already pointed out that the minimal scaffold in general offers superior guarantees as a tool, both for network analysis and network skeletonization. On the other hand, the loose scaffold clearly has an advantage in terms of computational complexity: while it is in principle viable for most of the applications where persistent homology has been employed, the minimal scaffold, even adopting filtration-wise parallelization, requires a vastly larger amount of computational power, which effectively limits its range of application, unless run on dedicated, high-performance infrastructures.
A reasonable question to ask is the following. If one is interested not in the exact structure of the scaffold, but only in its statistical behaviour, could the loose scaffold provide a sufficient approximation of the minimal one? In a more concrete example, if instead of wondering exactly which nodes in a network are the most topologically important one is interested in the distribution of the degree sequence of the minimal scaffold, could the loose one come to one's help?
To answer this question, we have performed comparisons of several graph metrics in the two scaffolds of C. Elegans. Further, to gain insight into the general case, we have sampled two families of random graphs at different parameter values, one for geometric graphs (Random Geometric Graph), and one for non-geometric graphs (Weigthed Watts-Strogatz).

C. Elegans
For the C. Elegans dataset, we have compared the following graph metrics of the minimal and loose scaffolds:  Fig. 9(c)) indicate that, for metrics 1 to 5, the two scaffolds are very well correlated. So for example the cheap, loose scaffold is a reliable proxy of the distribution of the "true" degree sequence (scatterplot in Fig.  9(d)). We instead observe poor correlation of edge weights and clustering coefficients. The first one is not unexpected, since the edge weighting procedure is conceptually different in the two scaffolds: while in the minimal one we consider a different basis for each filtration step, the loose scaffold considers bases of the persistent homology space, drastically reducing the number of cycles considered. To make it clearer, in general set B * has cardinality much larger than the dimension of P H 1 . It is therefore explicable that the distributions of edge weights do not generally agree. Clustering coefficients, on the other hand, are a measure of how "triangular" a graph is around a given node. As remarked in Section 5, another consequence of assembling the scaffold from the minimal bases of the H 1 's is that a large number of artificial triangles appear around cycles. In this case too, therefore, the poor correlation is easily explained.

Random Graphs
Drawing inspiration from [52], we repeat the analysis on random graph samples. [52] divides random networks into two categories: those created from edge weighting schemes and those created from points in the Euclidean space. We have chosen   to analyze the weighted Watts-Strogatz (WS) model as representative of the first class, and the geometric random model as representative of the second. We remark that weighting needs to be introduced in order to compute persistence; while for geometric graphs this simply requires computing the Euclidean distance, for the Watts-Strogatz model it requires an ad-hoc procedure that is described in detail in the supplemental material of [52]. We briefly recall that a WS graph is parametrized by the number of nodes, by the number of stubs to rewire, and by the rewiring probability. A random geometric graph is instead parametrized by the number of points to sample (uniformly) in [0, 1] d , and by a cutoff value that acts as distance threshold, beyond which no edge is introduced.

Watts-Strogatz Model
In both cases, we observe good agreement on key statistics, as reported in Fig.  9(a) and (b). Each bar is obtained by computing the correlation of the reported statistic on a sample of 30 random graphs of the reported model, with parameters as indicated on the x-axis. For comparison, two null models are built for each instance of the minimal and loose scaffolds in the sample, by constructing an Erdős-Rényi random graph on the same vertex set, one with the same number of edges as the minimal scaffold, and one with the same number as the loose one. The correlation is computed of each statistic between the minimal scaffold and the loose null model and between the loose scaffold and the minimal null model. The average of these correlations is reported on the boxplots to act as a baseline value, highlighting that the two scaffolding procedures agree with each other by more than just statistical noise.

Conclusions
We provided a new method of network analysis and skeletonization, based on the computation of minimal homology bases. This new new construction fills a significant gap in previous literature, in that it yields, in all but some pathological cases, a well-defined and unique subgraph, acting as a reasonable ground truth for comparison with the previous construction. It can be employed in a range of applications, both to identify crucial and weak links in a network, and to obtain compressed and topologically sound representations of the input. It also allows to evaluate the reliability of other scaffolding procedures with respect to said ground truth: we have observed that, for some applications, the loose scaffold can be deemed a sufficiently accurate tool, while not incurring in as cumbersome a computational load. We foresee that the subject of homological skeletonization is not yet concluded. Other approaches to finding canonical generators of homology are possible (for example in [54] and [75]), and we plan to investigate them further in subsequent works.