A random interacting network model for complex networks

We propose a RAndom Interacting Network (RAIN) model to study the interactions between a pair of complex networks. The model involves two major steps: (i) the selection of a pair of nodes, one from each network, based on intra-network node-based characteristics, and (ii) the placement of a link between selected nodes based on the similarity of their relative importance in their respective networks. Node selection is based on a selection fitness function and node linkage is based on a linkage probability defined on the linkage scores of nodes. The model allows us to relate within-network characteristics to between-network structure. We apply the model to the interaction between the USA and Schengen airline transportation networks (ATNs). Our results indicate that two mechanisms: degree-based preferential node selection and degree-assortative link placement are necessary to replicate the observed inter-network degree distributions as well as the observed inter-network assortativity. The RAIN model offers the possibility to test multiple hypotheses regarding the mechanisms underlying network interactions. It can also incorporate complex interaction topologies. Furthermore, the framework of the RAIN model is general and can be potentially adapted to various real-world complex systems.

intra-network topology, (iii) it should be easily adaptable to different scenarios corresponding to different classes of real-world systems, and (iv) lastly, it should help us to understand the mechanisms that govern the formation of observed interaction structures in the systems being studied with this model. The last point is crucial as a model is useful only in as much as it extends our understanding of the systems under consideration. This is seen, e.g., in the seminal studies that revealed the small-world 18 and scale-free 19 behavior of single networks in real-world systems.
In this study, we first formulate the general characteristics of the RAIN model and thereafter use it to understand the interactions between the regional airline transportation networks (ATNs) from USA and the Schengen area with the idea that this not only helps to demonstrate the utility of the model framework but also sheds light on the USA-Schengen inter-network structure. ATNs have been studied considerably because of their economic and social impacts [20][21][22] , as well as their potential role in the spreading of contagious diseases 23,24 . Based on the single network perspective (regional as well as global), earlier studies have shed light on several ATN characteristics, such as scale-free structure 20,21 , "rich club" phenomena of airport hubs 25,26 , and existence of 'free' and 'congested' phases in flight traffic 27 . Motivated primarily from the observed multi-community structure of the global ATN 21 , we put forward the idea that, in order to understand the interactions between the ATNs of different regions, it is more useful to consider each region as a distinct subnetwork with inter-network connections among them -hence an interacting network model in lieu of a single network model. Critical structural differences in the ATNs 28 , and the fact that transatlantic flights mark a different regime (≈ 4000 km) in the distribution of geographical link lengths of the global ATN 29 also lend further support to an interacting network picture in place of a single network.
The RAIN model statistically relates intra-network features to inter-network structure. Its fundamental framework is presented in a general fashion such that with appropriate formulation of its components, the larger framework is applicable to a wider spectrum of interacting real-world complex networks other than the particular example considered here.

RAIN model
We assume that intra-network characteristics influence the interactions between different networks, and model this as follows: (i) a selection step which gives us a pair of nodes, one from each subnetwork, which might have an inter-network link, and (ii) a linkage step which determines whether the chosen nodes are connected. Both these steps are formulated in general terms and can be adapted to different situations. Node selection is based on the selection fitness ϕ of a node in its own network estimated from the data, which gives high (low) fitness to nodes of high (low) degree or any other node-based characteristic. Linkage is achieved by a linkage probability p, which may connect nodes with high (low) linkage scores ρ in one network to nodes with high (low) linkage scores in the other network, or vice versa, depending on the application. The linkage probability gives the probability that a pair of selected nodes are linked, given their linkage scores. A simple schematic outline of the steps involved in the RAIN model is given in Fig. 1 which shows how a pair of nodes are chosen and a link between them is assigned in a single model run. The particular form of selection fitness, linkage scores, and linkage probability used depends on the specific system being studied and the research questions being asked of it. Given the internal link distribution of each subnetwork, the RAIN model used with a suitable choice of fitness and linkage probabilities generates inter-network links and allows us to understand better the mechanisms underlying observed inter-network structure by using contesting, multiple scenarios.
Node selection. Let X A and X B be the adjacency matrices of two networks A and B using which we can estimate a set of node-based characteristics ξ for a node i in network A and similarly another set of node-based characteristics ξ ξ ξ ξ = , , …, for a node j in network B, where k ≠ l in general. The node-based measures ξ i A k and ξ j B l considered here could be typical measures such as degree, local clustering coefficient, local betweenness coefficient, page rank, eigenvector centrality, etc. Thus, the elements of the sets ξ i A and ξ j B are functions defined on the nodes of the subnetworks. The indices i and j refer to nodes in networks A and B respectively from here on.
The RAIN model provides a statistical description of inter-network interactions as given by the inter-network adjacency matrix X AB based on the node-based characteristics ξ i A k and ξ i B l . To do this, we first define a selection fitness ϕ for each node in network A such that, and similarly a selection fitness ϕ j B for each node in network B. We use ϕ i and ϕ j to preferentially select pairs of nodes from the given networks for a potential inter-network link. Nodes which are more "important" in a network receive a higher chance of having an inter-network link. The particular definition of importance based on ξ i A k and ξ i B l depends on beliefs/evidence about the factors that may influence inter-network interactions. For the example of the USA-Schengen airport networks, we choose to model the inter-network links as a power law function of the intra-network degrees s, based on an observed power law like dependence between the intra-network degrees s and the inter-network degrees x (cf. Fig. 2). Thus, we have where the parameters γ and α are estimated from the data for each of the regional ATNs in the USA and the Schengen. We would like to caution here that the estimation of a statistical relation like the power-law should not be conflated with an explanatory mechanism between the intra-and inter-network degrees of the considered international airports. Clearly, the power-law is a simplified fit or a first approximation to the scatter of data points in Fig. 2. For the purpose of illustrating the functions of the RAIN model, however, this is helpful in showing how an overly simple choice of F sel impacts the final results (see section Results below). In other applications, prior understanding of the systems under study might motivate a different F sel , e.g. in case of the human brain, if we model each cortical region as an individual network, ϕ might depend not only on the intra-network degree but presumably also on other factors such as local clustering. The sets ξ i A and ξ j B would then contain two elements (degree and clustering of nodes i and j) and not just the degree as in the current example. Moreover, F sel might also likely have a different functional form, such as exponential, polynomial, or linear. Node linkage. The next step in the RAIN model is to determine whether or not to place an inter-network link between a pair of selected nodes based on a comparison of node-based features of the chosen nodes. For this, we define a linkage score ρ based on two sets of node-based characteristics χ i A and χ j B , for a node i in network A, and a similar score ρ j B for a node j in network B. In Eq. 3, the set χ i A has m node-based measures whereas the set ξ i A from Eq. 1 has k node-based measures. This highlights the fact that in general, the sets ξ i A and χ i A are not necessarily the same. However, in most cases we recommend having ξ χ = where s A and s B denote the intra-network degrees of nodes in networks A and B respectively. In particular, the linkage function F link places the relative importances of nodes from both subnetworks on the same scale to make them comparable, for which we use a min-max transform where for a node i in network A and similarly we define ρ j B for a node j in network B. In general, one can use other functional forms of the linkage function such as a normalization to mean zero and unit standard deviation where the similarity of a pair of selected nodes (one from each network) is decided by how far (in units of standard deviation) they are situated from their respective means. The final ingredient of the RAIN model is a linkage probability p ij AB which operates on the linkage scores ρ i A and ρ j B and returns the probability of placing a link between the selected nodes i and j, i.e.
For the USA-Schengen ATNs, we choose a Gaussian probability such that airports with similar linkage scores (which, in turn, are proportional to the intra-degree s of the airport) have a higher chance of getting connected. This is motivated by the fact that hubs tend to be connected to other hubs in the global ATN. Thus, where σ is a free parameter of the model which influences the extent to which hubs tend to be connected to other hubs. Note that this choice of F prob AB is assortative, but in general it is possible to define a disassortative linkage probability as well. For instance, in trying to model interactions between networks where the interaction topology is disassortative, we can potentially use which gives a higher chance to have inter-network links between those nodes i and j whose linkage scores are more different. Note, however, that the function F prob AB should be defined such that it maps into the interval [0, 1] and is well-defined on the entire range of linkage scores ρ i A and ρ j B .

Inter-network assortativity
We consider two main characteristics of the interactions in order to assess the validity of the model outputs: (i) the inter-network degree distributions, and (ii) the inter-network assortativity between the linkage scores of nodes from both networks. We define the inter-network assortativity coefficient r inter for two interacting networks as the Pearson correlation coefficient of linkage scores ρ i A and ρ j B appearing at the opposite ends of inter-network links as given below: Figure 2. Intra-and inter-network degrees of the regional ATNs in USA (blue circles) and the Schengen (red triangles). The solid lines indicate the power law like relations estimated using linear regression between log(x) and log(s). Open markers denote outlying data points that are excluded from the regression analysis.
Scientific RepoRts | 5:18183 | DOI: 10.1038/srep18183 where X ij AB is (i, j) th element of the inter-network adjacency matrix X AB . In this matrix, the first index always corresponds to the network A and the second to the network B, and μ A and μ B are the mean values of ρ over inter-network links: If on average the fitness values at the two ends of inter-network edges are similar, then r inter > 0, if they tend to vary in opposite directions then it is negative and if no such correlation exists, it is zero. This measure, ranging between − 1 and 1, reflects the tendency for high (low) degree nodes from one subnetwork to be connected to high (low) degree nodes in the other subnetwork, or vice versa. Note that the inter-network assortativity defined here is similar to the inter degree-degree correlation (IDDC) defined in an earlier study 30 . However, a major difference between the two is that the IDDC is defined specifically on the degrees of the nodes, whereas the inter-network assortativity can be used to estimate the assortative nature of any node-based characteristic. In our current example, we estimate r inter for the linkage scores ρ i A and ρ j B which are, in effect, normalized degrees. Multiple flights between two airports are retained as multiple edges. The total number of regional (transatlantic) flights starting or ending at a given airport gives us the intra-network (inter-network) degree.

Intra-and inter-network degrees.
The inter-network degree x of an airport follows a power law like relation based on its intra-network degree s (Fig. 2). We note that this is only a first approximation of the various factors that could possibly influence the inter-network degree. This is particularly true for the case of the USA ATN, where the power law curve does not fully explain the relation between x and s. This discrepancy between the fitted curve and the unknown true relation between the intra-and inter-network degrees has further consequences in the results of the RAIN model (discussed in the following sections). We estimate the power law relation as stated in Eq. 2 using linear regression between log(x) and log(s), and find that γ = 1.14 for USA but γ = 2.48 for the Schengen. We use this to define the selection fitnesses ϕ USA and ϕ SCH for all airports in both networks such that ϕ ∝ (  Fig. 2 includes the intra-and inter-network degrees of only those nodes which have at least one transatlantic link, which results in 28 airports from the USA and 31 airports from the Schengen. Two airports from the Schengen region with s < 50 are considered as outlying points and are excluded from the regression analysis because they have much higher inter-network degrees while only relatively low intra-network degrees.

RAIN model for USA-Schengen interactions. To implement the RAIN model for USA-Schengen
inter-network interactions, we consider three different scenarios: (i) We select nodes from each network based on ϕ i USA and ϕ j SCH , but we do not consider linkage scores for link placement. Instead, we put a link between all selected nodes. We call this only preferential picking, i.e. onlyPP. (ii) We select transatlantic nodes with a uniform probability disregarding their selection fitness values, but place a link only when allowed by the probability p ij based on their linkage scores ρ i USA and ρ j SCH . We call this only assortative matching, i.e. onlyAM. (iii) We select nodes based on selection fitness values and place a link based on linkage scores and the linkage probability. We call this the full model, i.e. PPAM.
We use S = {onlyPP, onlyAM, PPAM} to denote the set of the three possible scenarios considered in our analysis. In each model run, we generate 919 transatlantic links which is equal to the observed number in the data. Moreover, we consider 5000 such model runs for each scenario (cf. Methods). In Fig. 3, we show the inter-network degree distributions generated by the RAIN model for the various scenarios. We mainly find that S = onlyAM fails to replicate the observed inter-network degree distributions, whereas the onlyPP and PPA scenarios have almost overlapping distributions -both of which somewhat approximate the observed distributions from the data, barring the tail regions. The distributions match better for the Schengen region than that for the USA, indicating that the s SCH has a better explanatory power for the inter-network degree x SCH in the Schengen and that there are additional factors influencing the inter-network degree for airports in USA other than s USA .
We estimate the median r inter values along with its inter-quartile range (IQR) from 5000 model runs for various values of σ (Fig. 4a). There are two cases for which the median r inter value from the RAIN model scenarios are close to the observed = .  (Fig. 4b), which we interpret as the likelihood   Fig. 4a indicated the onlyAM scenario at σ = 0.1 as a potential candidate that gives a value of r inter close to the observed value, the likelihood of the onlyAM scenario at that value σ is nearly three times as less than the likelihood of getting the observed assortativity with the PPAM scenario at σ = 0.8.

Discussion
In this study, we have proposed a random interacting network (RAIN) model and used it to model the transatlantic interactions between the USA and Schengen flight networks. A crucial feature of the model is that it enables us to relate intra-network characteristics to inter-network structure (as seen in Eqs. 1 and 3). Being a model of network interactions, it inherently allows us to define regional (intra-network) hubs and formulate measureable, well-defined inter-network characteristics such as the inter-network degree distributions and inter-network assortativity. The RAIN model is essentially a prescription to obtain inter-network links between any given pair of individual networks such that they are determined by intra-network features. It requires two consecutive steps: node selection and node linkage. Node selection is defined using the selection fitness ϕ of a node, based on any number of node-based characteristics such as degree, page rank, local betweenness, and local clustering. The appropriate set of node-based characteristics required to explain the inter-network structure could be, in principle, decided a priori based on observations of the complex networks under study. Node linkage in the RAIN model is achieved with the help of linkage scores ρ defined for each node on the basis of a function F link . The primary purpose of F link is to transform the relative importances of nodes form both subnetworks to a similar scale such that they become comparable. The precise notion of what constitues the 'importance' of a node has to be decided a priori. In most cases, this is equal to the node-based characteristics that determine ϕ. However F sel could have a different functional form than F link as is the case of our chosen example with the USA-Schengen ATNs. Also necessary for the linkage between any pair of nodes is the linkage probability p ij which returns a probability of how likely it is that selected nodes i and j with linkage scores ρ i and ρ j are linked. Different choices of p ij could lead to different inter-network topologies.
In applying the RAIN model to the USA-Schengen ATN interactions, we have started from the fundamental assumption that the number of transatlantic flights of any given airport (i.e. the inter-network degree x) is partly determined by the number of flights that the airport has in its own regional network (i.e. the intra-network degree s). Estimating the selection fitness for each airport from the data, we have shown that the inter-network degree of an airport, on average, depends on its intra-network degree in a power law like fashion, with exponents of 1.14 and 2.48 for airports in USA and Schengen respectively. However, this relation between x and s is only a rough approximation of the various factors that might influence the inter-network degree of an airport. This is seen from the fact that the selection fitness used in our study failed to reproduce the tail portions of the inter-network degree . The likelihood P(r inter = 0.095|S, σ) at each σ value is estimated as shown in Fig. 4b.
distributions as obtained from the data, especially for USA (Fig. 3). In general, x can depend on any combination of various node-based characteristics such as geographical location, proximity to other airports, intra-network betweenness, intra-network clustering, etc. A more thorough application of the RAIN model might aim to unravel the correct combination of such influencing factors on x and define a meaningful F sel (as in Eq. 1) based on a larger set of ξ i A 's and ξ j B 's. Such an analysis, however, is beyond the scope of our current study. We have also considered the inter-network assortativity r inter as an additional critical characteristic of the interaction topology. In the model implementation, we have chosen three model scenarios: S = {onlyPP, onlyAM, PPAM} and have shown that the likelihood of obtaining the observed inter-network assortativity r obs inter was the largest for S = PPAM, which included both preferential node selection and assortative node linkage (Fig. 5). Even though S = onlyPP was able to reproduce the inter-network degree distributions as well as S = PPAM, the latter had much higher likelihoods of producing assortativity values close to r obs inter . On the other hand, even though S = onlyAM was able to produce assortativity values close to r obs inter for low values of σ at around 0.1, it failed to reproduce the inter-network degree distributions of the real-world data (cf. Fig. 3). According to the principle of Occam's razor, the miminal model with the power to relatively well explain both the inter-network degree distributions as well as the inter-network assortativity is the full model scenario PPAM. Thus, the existence of a transatlantic flight between a pair of airports, chosen one each from USA and Schengen, depends on two major factors, (a) the number of regional flights to-and-from the airports, and (b) the similarity between the relative importance of the airports in their respective regional ATNs. Both these factors are necessary to reproduce the observed interacting network characteristics.
The RAIN model can incorporate arbitrarily complex inter-network structures and with appropriate choices of the selection fitness, linkage scores, and linkage (filter) function, it has the potential to be applied to a wide variety of different real-world systems that are modeled as complex networks. Other systems where this framework is applicable include the neuronal network of the brain, protein interaction networks, and power grid networks. However, there is no universal prescription to define F sel , F link , and F prob , as well as to decide which intra-network characteristics should be chosen to model the inter-network topology. These details have to be worked out and estimated with respect to each individual analysis. The application of this framework to other complex systems with the hope of uncovering newer insights is the focus of future studies.

RAIN model implementation for the USA-Schengen ATN example.
Node selection based on preferential picking. We implement node selection, or preferential picking (PP), using a standard method of generating network links on the basis of fitness scores that is free from rejections. Given that the selection fitness scores ϕ i USA and ϕ j SCH of all nodes i in USA and j in the Schengen have been estimated according to Eq. 2, the algorithm for PP implemented in our analysis is outlined below. Node linkage based on assortative matching. We implement the node linkage based on assortative matching (AM) based on the Gaussian linkage probability p ij given a particular linkage probability width σ as follows. AM 1. Generate a random number r in the interval (0, 1). AM 2. Evaluate the Gaussian linkage probability p ij based on the linkage scores ρ i USA and ρ j SCH of nodes i and j according to Eq. 4. AM 3. If r < p ij then place a link between nodes i and j, else do nothing (i.e., reject the choice of i and j for a transatlantic edge). Model scenarios. In our analysis we consider, three model scenarios: S = {onlyPP, onlyAM, PPAM}. The details of the algorithmic implementation of these scenarios for a single model run is outlined below.
(i) S = onlyPP. First, choose nodes i from USA and j from the Schengen using the steps PP 1-5 outlined above. Next, place a transatlantic link between i and j irrespective of their linkage scores ρ i USA and ρ j SCH . (ii) S = onlyAM. First choose nodes i from USA and j from the Schengen randomly, i.e., irrespective of their selection fitnesses ϕ i USA and ϕ j SCH . Next, place a transatlantic link between i and j according to their linkage scores ρ i USA and ρ j SCH and the linkage probability p ij using steps AM 1-3 outlined above. (iii) S = PPAM. First, choose nodes i from USA and j from the Schengen using the steps PP 1-5 outlined above.
Next, place a transatlantic link between i and j according to their linkage scores ρ i USA and ρ j SCH and the linkage probability p ij using steps AM 1-3 outlined above.
For each of the model scenarios given above, the above steps are repeated until the number of transatlantic edges in the model are equal to 919, which is the number of transatlantic flights observed in the data.

Likelihood of obtaining r obs
inter . We run the RAIN model for each of the three scenarios for 5000 runs for σ varying from 0.1 to 1.5 in steps of 0.1. For each model scenario, given a particular value of σ, we pool together all the degree sequences to get an overall degree distribution from the RAIN model for that scenario. This is shown in Fig. 3 for σ = 0.8. Furthermore, we estimate r inter for each model run, for each scenario, and for each σ. This is used to estimate the probability density P(r inter |S, σ) (cf. Fig. 4) based on a kernel density estimation using an optimal bandwidth for exponential kernels with the Python toolkit Scikit-learn 31  where P(S) is our prior belief on how likely it is to have the chosen scenario (from the set S) in the first place. Note that, taking into account the fact that the only AM scenario fails to reproduce the inter-network degree distributions, we could set the priors for each scenario such that 0 < P(S = onlyAM) ≪ P(S = onlyPP) < P(S = PPAM) < 1, based on which we can determine the most likely model given the data. For such choices of priors, S = PPAM is the most likely model given the data.