A nature inspired modularity function for unsupervised learning involving spatially embedded networks

The quality of network clustering is often measured in terms of a commonly used metric known as “modularity”. Modularity compares the clusters found in a network to those present in a random graph (a “null model”). Unfortunately, modularity is somewhat ill suited for studying spatially embedded networks, since a random graph contains no basic geometrical notions. Regardless of their distance, the null model assigns a nonzero probability for an edge to appear between any pair of nodes. Here, we propose a variant of modularity that does not rely on the use of a null model. To demonstrate the essentials of our method, we analyze networks generated from granular ensemble. We show that our method performs better than the most commonly used Newman-Girvan (NG) modularity in detecting the best (physically transparent) partitions in those systems. Our measure further properly detects hierarchical structures, whenever these are present.


Packing algorithm
In the present study soft particle DEM model is used where, particles motions are tracked by integration of the Newtonian equations of motion.
Where, F  is the total force acting on the particle, m is the mass, v  is velocity vector,   is the angular velocity vector, T  is the net torque and I is the moment of inertia of the particle.

Modelling of contact forces
In soft particle DEM model, the non-zero overlap (ij, eq. S1.3) is given as Where, i R , j R is the radius and i r  , j r  is the position vector of the two interacting particles.
The normal component of the contact force is modelled using Hertz model 2 which is further modified for viscoelastic spheres 3 and is given by the eq. S1.4.
The tangential force component has been modelled by using Haff and Werner model 4 .
Where, t  is tangential damping constant and  is the coefficient of friction.

Numerical scheme
Here we have used Gear integration scheme 5 for numerical integration. The flow chart of the DEM algorithm is given in figure S1.1. At the start of the simulation, all the required parameters are initialised (time t =0). In the time increment loop (timestep of 1 µsec), the scheme performs three steps. First it predicts the values of all the parameters for the next time step by using Taylor series expansion. Second step is total force calculation on each particle. Total force on the particle is the sum of all the external forces and the interaction forces. Using the total force acting on each particle, one can calculate linear corr r    and angular accelerations by using Newton's law of motion (eq. S1.1 & eq. S1.2). Third step is the correction in the predicted values of first step. In this step, the difference in the two acceleration values, (eq. S4.6) is used to correct the predicted coordinates, predicted velocity and other higher order derivatives by using (eq. S1.7).
The coefficient ci depends on the order of algorithm used and also on the type differential equation used. In the current simulation algorithm of fifth order is used for which the values of ci are as given below- The data is extracted and stored periodically at the interval of some predefined time steps. Simulation is terminated after a predefined time steps in all cases.

Boundary conditions
The static and dynamic properties of granular materials are greatly affected by the interaction of particle with the wall of the system. To eliminate this effect, the present simulation uses a hypothetical central force for particle movement. Here packing is obtained without using any container thus eliminating the use of any wall interaction with the particle. So the boundary conditions are not required for the present simulation.

SI-2: Algorithm (RN method)
RN method is based on spin-glass-type Potts model. It detects community by following two criterions: First, it favours the edges inside the community and second, it penalizes for the missing links inside the communities. Based on this consideration the Hamiltonian used in RN method is given as Here, the meaning of each term is same as used in eq. 2 of main article. The Hamiltonian of equation S2.1 is formulated based on the concept of formation of magnetic domains in magnetic material. It models a system, where all the connected spins of same magnetic domain interact ferromagnetically and if not connected then antiferromagnetically. The equation S2.1 incorporates the contribution of these two interactions: "attractive" (ferromagnetic) and "repulsive" (anti-ferromagnetic) by using two different weight matrices: aijAij and bijJij respectively. The resolution parameter  provides flexibility to adjust the energy exchange between these two interactions.
The algorithm of RN method considers each node separately. It iteratively decides the best community membership for a node by calculating the energy of the system and assigns the membership in which the system energy is lowest. A pictorial representation of RN algorithm is shown in figure S2.1. The steps of the algorithm 6 is discussed below- (1) System initialization: The connection matrix Aij and the edge weight aij and bij is initialised. This method starts with "symmetric" state, where each node is assigned as separate community membership. So at the start of the simulation, the number of communities is same as the number of node. (2) Finding the best node membership: Neighbor list of each node is generated. Calculate the energy change after moving the node into one of its connected cluster. Finally put this node in the cluster in which energy change is most negative (as it decreases the Hamiltonian) (3) Iterate until convergence: Step (2) is repeated until there is no further new community membership assignment of any node is possible (minimum energy reached). (4) Test for the local energy minimum:It might possible that the community structure obtained until step (3) is not the most modular structure. So in this step, we manually merge all the pair of connected communities. If this merging lowers the overall energy of the system consider this new merged community structure as best partition.

SI-3: Different scenarios possible in network
In a partitioned network, there are four different scenarios are possible. A pictorial representation is shown in figure S3.1. These scenarios are based on the connected and missing edges, within or outside of the community. These are discussed below with their effect on the overall energy of the system: 1. An edge between nodes of same community. This decreases the energy of the system as it strenthen the community. 2. A missing edge inside the community. This increases the energy of the system as ideally the entites of same community should interact more frequently. 3. A missing edge between two different communities. This decreases the energy because entities of two different communities should not interact with each other. This will preserve the identity of each individual communities. 4. An edge between two different communities. This increases the energy of the system as this edge will reduce the clarity of boundaries between the communities.

SI-4: Response of new modularity function for all possible scenarios
The modified modularity function is given as Here, the meaning of each term is same as used in eq. 3 of main article. It has four independent variables namely, aij, Aij, (Δxij) and (i,j). Each variable has two different possible values, so there will be 2 4 =16 possible combinations of their values. In this section we have discussed the effect of these different combinations on Q() i.e whether it increase or decrease Q(). A sample granular assembly has taken for pictorial representation of different cases if possible. Particles with different coordination numbers are differentiated by colors as reflected by color bar used in main article. The granular network obtained from the given ensemble will have same degree distribution as its coordination number distribution. The average degree for this network is 5.6. Maximum and minimum degree is 6 and 2 respectively. The cutoff distance is 2.1 times the radius of particles. Since this cutoff distance is nearly equal to the first neighbourhood of the particles, all those cases that considered an edge between nodes outside the cut-off distance (case 2, 4, 6, and 8) are physically not possible in the considered network. Case 9 and 11 is not possible because it considers two highly linked nodes (aij>0) within cut-off distance to be unconnected. This case is similar to the first scenario that we have discussed in S3. As this edge is between two highly linked nodes (aij>0) will strengthen the community, it should decrease the overall modularity of the system. Thus our function output is matching with the expected output.
Case 2 (not possible in considered network): Node i and j connected, outside cut-off range and in same community aij>0, Aij=1, (i,j)=1, (xij)=0, Jij=0, bij>0 aij=6-5.6>0, (xij)=0, since xij =2.1r-4r<0 (r is radius of particle) The output of our function for this case favors an edge within the same community. Our function value is positive because aij is positive, which indicates that the two connected nodes are high degree nodes. So putting highly linked nodes in same community should increase the modularity.

Case 3:
Node i and j connected, within cut-off range and in different community aij>0, Aij=1, (i,j)=0, (xij)=1, Jij=0, bij>0 aij=6-5.6>0, (xij)=1, since xij =2.1r-1.97r>0 (r is radius of particle) This case is similar to the fourth scenario discussed in S3. Links between two different communities should always be restricted because the connected nodes are highly linked nodes (aij>0). Thus this arrangement should decrease the modularity of the system. Our function is giving the same output as expected.
Case 4 (not possible in considered network): Node i and j connected, outside cut-off range and in different community aij>0, Aij=1, (i,j)=0, (xij)=0, Jij=0, bij>0 aij=6-5.6>0, (xij)=0, since xij =2.1r-3.7r<0 (r is radius of particle) In this case also there is an inter-community link which is in between two highly linked nodes (aij>0), which should not be favoured. Thus this arrangement should decrease the modularity of the system and our function is doing the same.
Case 5: Node i and j connected, within cut-off range and in same community aij<0, Aij=1, (i,j)=1, (xij)=1, Jij=0, bij>0 aij=5-5.6<0, (xij)=1, since xij =2.1r-1.97r>0 (r is radius of particle) In this case though edge is in between nodes of same community, it should not be favored as these two nodes are low degree nodes (boundary nodes, aij<0) and should ideally be treated as a separate community that can be thought as "boundary community". The strength of our method is its capability to distinguish such "boundary communities". Case 6 (not possible in considered network): Node i and j connected, outside cut-off range and in same community aij<0, Aij=1, (i,j)=1, (xij)=0, Jij=0, bij>0 aij=5-5.6<0, (xij)=0, since xij =2.1r-4r<0 (r is radius of particle) This case is similar to case 1 which is discussed above, only the difference is that the connected nodes are low degree nodes (aij<0). Thus putting them in same community might