Hidden geometries in networks arising from cooperative self-assembly

Multilevel self-assembly involving small structured groups of nano-particles provides new routes to development of functional materials with a sophisticated architecture. Apart from the inter-particle forces, the geometrical shapes and compatibility of the building blocks are decisive factors. Therefore, a comprehensive understanding of these processes is essential for the design of assemblies of desired properties. Here, we introduce a computational model for cooperative self-assembly with the simultaneous attachment of structured groups of particles, which can be described by simplexes (connected pairs, triangles, tetrahedrons and higher order cliques) to a growing network. The model incorporates geometric rules that provide suitable nesting spaces for the new group and the chemical affinity of the system to accept excess particles. For varying chemical affinity, we grow different classes of assemblies by binding the cliques of distributed sizes. Furthermore, we characterize the emergent structures by metrics of graph theory and algebraic topology of graphs, and 4-point test for the intrinsic hyperbolicity of the networks. Our results show that higher Q-connectedness of the appearing simplicial complexes can arise due to only geometric factors and that it can be efficiently modulated by changing the chemical potential and the polydispersity of the binding simplexes.

Self-assembly of nanoscale objects has been recognised as a powerful method enabling the design of advanced materials with new optical, magnetic, conducting and other properties [1][2][3][4] . Complex materials with a new functionality often exhibit hierarchical architecture [5][6][7][8] , suggesting that the self-assembly occurs at different scales from individual nanoparticles to groups and clusters to macroscale materials. In this regard, a cooperative binding of small formatted nanoparticle structures is crucial for the developing large-scale aggregates. They can be prefabricated nanocrystals, self-replicated information-bearing patterns [8][9][10] , or spontaneously formed groups of nanoparticles 1,3,5,11 . The affinity of nanoparticles to merge into a small formation, which then appears as a building block on a larger scale, depends on the particle density and constraints applied in the manufacturing process, and other factors that influence the interactions between them 6,7,12 . In addition to binding energy, this process is regulated by pertinent geometric rules [13][14][15] . Therefore, the control of the impact of self-assembly at various levels on the emerging hierarchical structure is essential for the new functionality of macroscopic materials. Here, we use numerical modeling to deepen the understanding of cooperative processes of self-assembly and geometric properties of structures that can arise. In the transition from clusters to the solid state, the clusters with a variable size and shape may appear, depending on the materials in question. For example, various atomic clusters represent the energy-minimum configurations of the electronic structure of interacting atoms. Our study is primarily motivated by the nanoparticle self-assembly where nanoparticles of different chemical composition, dimension and morphology can be manufactured. In this case, the energy favourable equilibrium states result through a combination of different inter-particle forces. For instance, an attractive van der Waals component can be rationally balanced by the repulsive electrostatic interaction, additionally modified by pH conditions of the solvent, which results in a variety of precursor clusters 16 . Instead of considering a particular type of interaction, our objectives are to develop a more general mathematical framework that takes into account the impact of the geometric compatibilities of building blocks on the self-assembly process and provides a language to describe the hierarchical structure of the assemblies. The emphasis is on their hidden geometries, which can offer a hint to understand the synergistic effects of the components. In this context, a suitable presentation by graphs or nanonetworks 17 enables the use of advanced graph theory methods to elucidate the structure and abstract essential geometrical descriptors of nano-structured materials 15 . For instance, the network model and topology analysis have proved useful in revealing the structural elements that are responsible for the improved tunneling conduction in self-assembled nanoparticle films [18][19][20] , and to identify the hidden order in amorphous materials 21 . Some recent investigations show how the use of topology can  open new ways for designing materials inspired by mathematics 15,22 . On the other hand, the research of growing complex networks has recently been extended to explore the attachment of objects (loops, simplexes) under geometric rules and control parameters [23][24][25][26] . In this regard, the self-assembly can be understood as a language that can describe the complex architecture of these networks. Varying the assembly rules and parameters enables us to explore a broad range of structures, compared to the laboratory experiments and the potential limits of the aggregation process, and understand the emergence of new features [24][25][26][27] . A particular anisotropy of the interaction and spatial constraints can lead to some interesting low-dimensional assemblies, for example, chains 28 and patterns obtained by tiling or recognition-binding on a two-dimensional lattice 24 , and self-assembly of loops under the planar graph rules 23 . By contrast, self-assembly of geometric objects without spatial embedding can lead to complex, hierarchically organized networks. Beyond the standard graph-theory metric 29,30 , the advanced techniques of algebraic topology of graphs 31,32 are used to explore the hidden topology of these networks; the primary goal is to find out how different geometric elements (simplexes) are mutually combining to make simplicial complexes. Analysis based on algebraic topology of graphs has been used in some recent studies, for example, to describe the hierarchical organization of social graphs 33 and the structure of the phase-space manifolds near the jamming transition 34 , as well as to adequately quantify the patterns of inter-brain coordination 35 and logically structured knowledge networks 36 . Moreover, in the hidden geometry metric of many complex networks, the closeness of the nodes is expressed by the graph's generalization of negative curvature or hyperbolicity 25,26,37 . It plays a significant role in the network's function. For example, a direct survey of the related graphs revealed the impact of negative curvature on metabolic processes 38 and traffic on the Internet 39 .
Here we introduce a model for the cooperative self-assembly, in which small, ordered structures of particles can be recognized as simplexes or full graphs (cliques) of different size that attach by nesting in a growing network. The process depends on the size of the group that is formed by the attachment, and it is directed by two ingredients. These are geometric factor, which refers to the availability of the geometrically appropriate sites where the clique can nest along one of its lower-dimensional faces, and the chemical factor associated with the affinity of the system for simultaneously binding an excess number of particles. We notice that for a simplex of a given size the geometric constraints change systematically when the network grows, whereas the chemical affinity affects the actual binding. By exploiting the interplay of these elements, we develop various classes of assemblies represented by graphs, and we investigate their structure using graph-theoretic metrics. We show that these structures possess higher combinatorial connectivity, which can be quantified by algebraic topology measures. With a large number of examples, we demonstrate how the geometrical element that plays a vital role in the appearance of the higher Q-connectedness can be enhanced or reduced by changing the chemical affinity of the assembly. We also show that these new structures exhibit a global negative curvature or δ-hyperbolicity. Our model concerns poly-dispersive cliques, whose size varies according to a given distribution in the range from a connected pair of nodes to 12-clique. As a particular case, we consider the aggregation of mono-disperse cliques of a given order. Below are the details of the model explained in the formal language of topology; to present the model at work, we provide the Web applet 40 .

Results and Discussion
Computational model. A clique of order q max is fully connected graph of s = q max + 1 nodes; some examples are shown in Fig. 1. Faces of the clique are cliques of the lower orders which are contained in the original clique σ σ ∈ q q max , where q = 0, 1, 2, 3 ··· q max − 1 correspond to a single node, a pair of connected nodes, a triangle, tetrahedron, etc., up to the largest subgraph of the order q max − 1, respectively. The number of equivalent faces is given . Hence, C 0 = q max + 1 = s is the dimension or the number of nodes involved in the considered clique. We assume that by docking (or nesting), a clique shares a face of order q with an already existing clique in the networks. In this way, the number of simultaneously added particles (nodes) is given by the difference of the dimension of the clique to be formed by docking and the size of the docking site, i.e., n a = (q max + 1) − (q + 1). Furthermore, we assume that the system's affinity ν towards adding new particle is finite. Therefore, the probability of docking along a particular face of the clique is weighted by the factor ν − e n a , considering the complementary n a particles. Therefore, the normalized probability for docking of a clique of order q max along its face of order q is given by where c q (t) is the number of geometrically similar docking sites of the order q at the evolution time t. In our model, a clique is formed in each time step t; the size of the clique can vary in a given range. In particular, here we consider cliques of the dimension s ∈ [1,12] taken from a power-law distribution g(s) = As −2 , where A is the corresponding normalisation factor. The empirical fact motivates this form of the distribution, namely, that larger cliques appear less often in modular networks. The network growth by addition of mono-disperse cliques is a particular case of our model. For instance, by fixing s min = s max = 3 (triangles) and s min = s max = 4 (tetrahedra), we obtain two types of networks with mono-disperse cliques. The first clique taken from the distribution g (s) is assembled and considered as the seed structure. Then, at each step, the size of a new clique s ∈ g (s) is taken and the clique is formed by attaching the number s − q − 1 of new nodes with the selected q + 1 nodes on the existing structure. Then the docking condition requires that these q + 1 nodes match a q-face of the new clique. According to Eq. (1), the selection of the simplex of the order q on the current structure depends on the number of geometrically suitable locations and the corresponding weighting factors. Figure 1b illustrates the effects of the geometrical factor in the example of forming a tetrahedron by attachment of n a red nodes to the small structure shown by the blue nodes. Considering Eq. (1), the case ν = 0 describes the probability of attachment by geometrical factor alone. In this case, the population of docking sites of the order q determines the likelihood that a new clique will attach by its q-face. On the other hand, the number of docking sites of a given size depends on the actual structure of the network. Note that, by adding a particular clique of the order q max to the system, all its unshared faces also appear as new cliques of lower orders. Thus, the number of simplexes fluctuates in time depending not only on the dimension of the clique which is formed in the docking event but also on the size of the actual docking site. It should be noted that while the simplicial complexes grow through the attachment of new cliques via shared faces, the process can not generate holes and cliques of the order larger than the cut-off size s max of the original distribution g (s). In the simulations, we keep track of details constituting each event. For example, the small segment of the output file shown below indicates the time step, current network size, the number of simplexes, order of the added clique, the number of new nodes, and list of all nodes which belong to that clique.  2 1 14 42  25 45 785 6 3 28 29 32 43 44 45  26 46 789 3 1 16 28 46  27 47 791 2 1 12 47  28 48 795 3 1 17 36 48  29 49 797 2 1 36 49  30 55 1805 10 6 17 28 30 31 50 51 52 53 54 55 For varied chemical potential ν, despite the statistically similar population of cliques appearing in the process (taken from the same distribution), the network growth speed and the average rate of the addition of simplexes R Σ ≡ 〈dΣ (t)/dt〉 are different being dependent on the docking probability. Figure 1c displays the evolution of the total number of simplexes Σ (t) and the number n σ (t) of the added simplexes per step for different networks until they exceed the target size N = 1000 nodes for the first time.
Specifically, a fast growth of the network is observed for the negative values of the parameter ν while much slower growth rates characterize the assembly process for ν ≥ 0. In fact, for ν < 0 the system "likes" addition of new particles, which represent the non-nested parts of the new clique. Hence, the cliques effectively repel each other resulting in a sparse structure and fast growth of the network size and also the addition of new simplexes. In contrast, when ν > 0 the cliques are preferably nested along their larger faces, thus reducing the number of the newly added nodes. This powerful attraction among cliques leads to dense network structure and a small number of added nodes and unshared faces per time step. This situation results in a slower growth of the network and reduced simplex addition rate, as shown in Fig. 1c. In contrast, the case with strictly geometrical assembly, ν = 0, has no preference for any size of a docking site; the probability is strictly determined by the number of locations of a given size. Accordingly, these details of the process have an impact onto the topology of the evolving assemblies, which we study in the following. For illustration, three examples of the networks containing the number N ≥ 1000 nodes for varied parameter ν and the same distribution of the incident cliques are shown in bottom panels in Fig. 1. Fig. 1 (bottom) demonstrate, the structure that emerges in the assembly of cliques depends strongly on the affinity for the simultaneous attachment of many nodes, apart from the geometrical constraints. Specifically, for large negative values of the parameter ν, an active 'repulsion' between the cliques results in the sparse structure, nearly representing a tree of cliques of different orders. This kind of structures possesses a significant average distance, the modularity, and clustering coefficient, which can be related to the original population of cliques. On the other hand, for the positive values of ν, the cliques firmly attach to each other, resulting in a gradually smaller number of the simultaneously added particles. The appearing structure possesses a large core of densely packed higher-order cliques while low-order structures remain at the periphery. An impressive network architecture with well-separated communities appears for ν = 0, assembled under geometrical constraints alone. As described below, the graph properties are tunned between these extremes by varying the parameter ν.

Combinatorial topology of aggregates with poly-disperse cliques. As the network examples in
Here, our focus is on the appearance of higher combinatorial topologies of these graphs, which is directly related to the ways that the assembled cliques share their faces of different orders. In the simulations, we keep track of each added simplex and nodes that participate in it, as explained above. In this way, for a clique of the order q max we can distinguish the number of its shared faces of order q < q max . Intuitively, when the groups repel each other, i.e., for ν < 0, their common faces will be the lowest orders, such as single nodes and links and, less often, triangles or higher structures. The opposite situation typically occurs for ν > 0 where the simplexes have a high affinity towards sharing nodes; cf. structure in Fig. 1. Due to shared faces, for instance, of the order q, the number of distinguishable simplexes of that order is smaller than the number of faces C q of a free added clique. Therefore, the topological response function f q of the network 34 can be determined as the number of different simplexes at the topology level q; it provides a good measure of the combinatorial complexity of the assembly in response to the varying external parameters ν, s max . In Fig. 2, we show how the function f q varies along the topology levels q depending on the parameter ν and the range of the distribution of the attaching cliques. The peak of the distribution shifts towards higher values when larger dimension cliques appear, whereas the height depends on the way that they interconnect at each the topology level. Further, Q-analysis based on the algebraic topology of graphs 32,41-43 is here applied for characterization of the graph architecture by determination of q-connected components for each topology level q. Specifically, for the topology levels q = 0,1,2, ··· q max of each studied network we determine the components of three structure vectors, {Q q }, {n q } and Q q , defined in Methods. These vectors allow a direct comparison of the hierarchical structure of various emergent networks. In Fig. 3, we plot the components of these structure vectors as a function of q for several assemblies of poly-disperse cliques with different chemical potential ν.
The similarity in the number of q-connected components (FSV) reflects the statistically similar population of cliques of all dimensions (taken from the same distribution) in all studied networks. However, their inherent structure is significantly different, which is expressed by the components of SSV and TSV for various q (see Methods). Notably, the third structure vector in networks for ν < 0 has non-zero components only at lowest topology levels; this implies that different higher-order cliques present in the graph will be separated from each other by removing the structures of the order q = 1 (link) between them. The situation is much different in the assemblies grown when ν > 0 where the simplicial complexes containing the higher-order cliques remain strongly interconnected until the before-last level q max − 1 = 8. These findings agree with the impact of the chemical potential favoring the cliques attraction for ν > 0 and repulsion for ν < 0. In this context, it is interesting to note that structure that was grown solely under the geometrical rules (ν = 0) already possesses a sizable hierarchical organization of simplicial complexes; although the degree of connectivity is systematically lower than in the case ν =+1, the structure holds together until the level q = 7. (See Table 1 for the exact values). As the Fig. 3 shows, this hierarchical architecture of the assembled networks gradually builds with increasing values of the parameter ν. To illustrate the differences in the hierarchical organization of the systems for ν =−1, 0, + 1, in Fig. 4 we display those parts of their structure that are still visible at the topology level q = 5. Precisely, the nodes participating in the simplexes of order q ≤ 5 which are not faces of the cliques of the order q > 5, are removed. The connections among the remaining nodes are shown according to the network's adjacency matrix.   Table 1. The components of the three structure vectors for the networks generated at different chemical potential ν.
The node's participation in building various simplexes also manifests in the global statistical features of the network. The cumulative degree distribution for several studied aggregates is given in Fig. 5. It is averaged over several realizations of the systems containing over 5000 nodes, where s max ∈ [2,12]. Although a broad distribution of the node's degree occurs in each case, it strongly varies with the parameter ν. It is interesting to note that, in the networks grown by geometrical constraints with ν = 0, we obtain the distribution with a power-law decay τ + 1 ≈ 3 (within the numerical error bars); its cut-off appears to depend on the size of the largest clique. In contrast, the exponential decay is observed for ν < 0 while a structure containing many nodes of a large degree is present in the case of clique attraction for ν > 0, which is separated from the low-degree nodes. Other graph theoretic measures also vary accordingly. δ-Hyperbolicity of the emergent networks. For network structures, δ-hyperbolicity is a generalization of negative curvature in the large 37 . Here, we consider the aggregates of cliques, which are known 0-hyperbolic graphs; therefore, these structures are expected to exhibit this intrinsic property at a larger scale. Following the procedure described in 37 , we investigate the 4-point Gromov hyperbolicity of different emergent networks. Specifically, we determine the average hyperbolicity 〈δ〉 in comparison to the graph's diameter for ν =−5, −1, 0, +1, and +5, by a sampling of 10 9 sets of four nodes, as described in Methods. Considering three different realizations of the network for each ν, we find numerically that δ can take the values {0,1/2,1}; hence, the maximum value δ max = 1 suggests that these assemblies are 1-hyperbolic. In Fig. 6 (bottom panel) we plot the average hyperbolicity 〈δ〉 against the minimal distance d min of the involved pair in the smallest sum  , see Methods. Notably, for all network types 〈δ〉 remains bounded at small values. In particular, we find that 〈δ〉 = 0 for the tree graph of cliques corresponding to ν = −5. Whereas, the hyperbolicity parameter is close to zero in the sparse network of cliques for ν =−1, and slightly increases in the more compact structures corresponding to ν = 0 and ν > 0. Note  Aggregation of monodisperse cliques. In this section, we briefly consider the structures grown with the same aggregation rules but with mono-disperse building blocks. Some compelling examples are the aggregates of tetrahedra and triangles. Tetrahedral forms are ubiquitous minimum-energy clusters of covalently bonded materials 12 . We also study the impact of the chemical potential in the event of aggregation of triangles. The importance of triangular geometry was recently pointed in the context of quantum networks 44 . Some examples of these structures grown by the aggregation rules of our model are shown in Fig. 7.
Since the aggregation process does not alter the size of the largest clique, these networks have only few topology levels. Specifically, in the aggregates of tetrahedra q max = 3, and they can share nodes, links, and triangles as faces of lower orders; for triangles, q max = 2 and shared faces are links and nodes. Therefore, their structure vectors are rather short. However, they possess a captivating structure of simplicial complexes, depending on the chemical potential and geometry constraints. Consequently, the degree distributions are altered by changing ν, as shown in Fig. 8. Notably, the appearance of some scale-invariant structures is favored by the mutual attraction of cliques for ν > 0. The aggregation of tetrahedra more efficiently builds such structures as compared with triangles. Whereas, the scale-free range is limited with the exponential cut-offs in the case of triangles unless ν is sufficiently large. Further analysis of these and other networks of mono-disperse simplexes is left for future work.

Discussion
We have introduced a computational model for cooperative self-assembly where small, formed groups of particles appear as building blocks for a large-scale structure. In this context, in addition to the binding forces, the geometric constraints exerted by the rising architecture play an important role on the proper nesting of the added block.  Different geometrically suitable options for nesting a given block structure are further altered due to the chemical affinity ν of the system for receiving the excess number of particles. Formal rules of the model are motivated by situations that usually occur in self-assembly of nanoparticles, where the possibilities for creating different clusters are tremendous. Nevertheless, the rules can be easily adapted to describe different other cases where, for example, due to interactions, only clusters of a certain type can appear and then combine into a hierarchical network.
It should be noted that the model explicitly does not take into account the effects of temperature and diffusion, which are experimentally controllable parameters. In the assembly, simplexes are added one by one, and every added object is attached (with probability one) to the structure when a geometrically suitable nesting site is found and remains in place. Therefore, in the limits described below, these aggregation characteristics resemble the well-known processes of diffusion limited aggregation (DLA), where the random particle tree grows in low-density conditions by attaching a particle that diffuses in the solvent when it approaches the tree 45 . Indeed, within the limits of the significant negative values of the ν parameter that promotes repulsive interactions between simplexes, the structure resembles a DLA tree, but here it is made from expanded objects (simplexes) and not individual particles. Note that in this case, ν refers to the number of excess particles of the coming simplex, while simplex joins the structure along the nest, containing the remaining particles. Hence, effectively, the chemical potential for the nested particles of the simplex is positive, in analogy to DLA binding. More specifically, when q max = 1, only one particle can be added with its link, and the growing structure is a random tree, independent of the ν value (see the web demo 40 ). In this case, only one type of binding process occurs with a probability one in the equation (1), regardless of the value and the sign of the parameter ν. For q max > 1, however, there are several types of bindings that are differentiated by exponential factors in the formula (1) as described above. Consequently, the emerging structure builds non-random features that are different from DLA clusters, as described in the Results section. We have demonstrated how different assemblies with a complex architecture can be formed in the interplay of these geometric and chemical factors. Moreover, the systematic mapping of the developing structure to the graph not only helps us formally implement the self-assembly process but also provides ways to adequately investigate the new structure employing advanced graph theory and algebraic topology methods.
It is interesting that the complex structure of the assembly that possesses combinatorial topology of higher order can arise due to only geometric factors. These topology features are further enhanced in chemically enforced compaction, and, on the contrary, are gradually reduced in sparse networks resulting from chemically favored repulsion between building blocks. Moreover, depending on the dispersion of the components and the chemical factor, the new assemblies may possess scale invariance and an intrinsic global negative curvature; these features are essential for their practical use and functionality. Our model with graph-based representation provides a better insight into the mechanisms that drive the assembly of hierarchically organized networks with higher topological complexity, which is a growing demand for technological applications.

Methods
Program flow for clique aggregation. Q-analysis: definition of structure vectors. To describe the global graph's connectivity 40 at different topology levels q = 0, 1, 2 ··· max , Q-analysis uses notation from algebraic topology of graphs 32,[41][42][43] . Specifically, the first structure vector Q q represents the number of q-connected components and the second structure vector n q is defined as the number of simplexes of the order greater than or equal to q. In this context, two simplexes are q-connected if they share a face of the order q, i.e., they have at least q + 1 shared nodes. Then the third structure vector determined as ≡ − Q Q n 1 / q q q measures the degree of connectivity at the topology level q among the higher-order simplexes. From the adjacency matrix of a considered graph, we construct incidence matrix by Bron-Kerbosch algorithm 46 , where simplexes are identified as maximal complete subgraphs (cliques). Then the dimension of the considered simplicial complex equals the dimension of the largest clique q max + 1 belonging to that complex.
Measure of curvature: δ-hyperbolicity definition. Following the studies in 37 and references there, we implement an algorithm which uses the Gromov's hyperbolicity criterion. Specifically, for an arbitrary set of four nodes A, B, C, and D, the distances (shortest path lengths) between distinct pairs of these nodes are combined in three ways and ordered. For instance, We denote the largest value = ) /2 against d min we can investigate the worst case growth of the function. For a given graph, we first compute the matrix of distances between all pairs of nodes; then, by sampling a large number of sets of nodes for the 4-point condition (2) we determine and plot the average 〈δ〉 against the corresponding distance d min .
Graphs visualisation We used gephi.org for graph presentation and community structure detection by maximum modularity method 47 .