Motif analysis in directed ordered networks and applications to food webs

The analysis of small recurrent substructures, so called network motifs, has become a standard tool of complex network science to unveil the design principles underlying the structure of empirical networks. In many natural systems network nodes are associated with an intrinsic property according to which they can be ordered and compared against each other. Here, we expand standard motif analysis to be able to capture the hierarchical structure in such ordered networks. Our new approach is based on the identification of all ordered 3-node substructures and the visualization of their significance profile. We present a technique to calculate the fine grained motif spectrum by resolving the individual members of isomorphism classes (sets of substructures formed by permuting node-order). We apply this technique to computer generated ensembles of ordered networks and to empirical food web data, demonstrating the importance of considering node order for food-web analysis. Our approach may not only be helpful to identify hierarchical patterns in empirical food webs and other natural networks, it may also provide the base for extending motif analysis to other types of multi-layered networks.

Scientific RepoRts | 5:11926 | DOi: 10.1038/srep11926 Standard network characteristics, such as motif spectra, however cannot per se capture the structure of networks that have multiple layers of complexity and therefore cannot be represented as a traditional graph. Here, we propose an extension of standard motif analysis for a specific network type, which we denote as directed ordered networks. Ordered networks are networks in which the nodes can be compared or related to each other by a binary ordering relation, < , that is independent of the graph topology. That is, we assume that for every two nodes in the network it is known which node is 'smaller' or 'larger' than the other. Ordered networks might reflect, for example, systems where the nodes are associated with an additional intrinsic property, such as size, fitness, importance, or geographic location 17 . In these systems the nodes can be naturally ordered according to the value of their intrinsic state variable. For example, individuals in a social network may be sorted according to their social status or financial income, (air) ports in a large-scale transportation network may be sorted according to their size or geographic location, and proteins in a regulatory network may be sorted according to their molecular mass or ubiquity among biota. In fact, such intrinsic states arise naturally in many natural and technical systems and therefore most empirical networks can be regarded as being ordered.
Ordered relationships are also of fundamental importance for describing trophic species interactions in ecological food webs. Here, species are sorted according to their body-size, accounting for the fact that it is more likely to find large predators feeding on small prey, than vice versa, i.e. large prey having small predators [18][19][20] . Empirical food webs have been constructed for diverse ecosystems [21][22][23][24][25] and statistical 10,[18][19][20] as well as evolutionary 26,27 models have been developed to understand mechanisms underlying them. Food webs have been characterized by a variety of network theoretic measures, such as typical food chain length, clustering coefficient, or degree distribution 10 . In particular, motif analysis of empirical food webs has shown to be helpful to obtain a quantitative analysis of their local structure 12,16,28,29 . However, none of these measures is able to incorporate the body-size related hierarchy pattern, i.e., the directed ordered structure, of food webs.
In this report, we develop a general framework for motif analysis in directed ordered networks. We first propose the notion of directed ordered networks as directed networks where the nodes constitute a totally ordered set. Next, we expand standard motif analysis to be able to capture the hierarchical structure in such systems. Our new approach is based on the identification of all ordered 3-node substructures and the visualization of their significance profile. Finally, we apply this technique to computer generated ensembles of ordered networks and to empirical food web data. Thereby, we demonstrate that the extended motif analysis is a promising technique to analyze hierarchical patterns and the role of body-size order in natural food webs and other complex networks.

Directed Ordered Networks
Definition of Ordered Networks. We define an ordered network as a graph, consisting of a collection of nodes that are connected by links, where the nodes constitute a totally ordered set. This means that nodes can be linearly ordered, i.e., for any pair of non-identical nodes i ≠ j either the binary relation i < j or j < i is true. This binary relation is transitive, that is if i < j and j < k then also i < k. When the nodes are ordered we denote the index of a node as its rank. If all network edges have an orientation (i.e., the underlying network is directed) we speak of a directed ordered network.
Ordered networks arise naturally, but are not restricted to, the important situation where every node i is associated with an intrinsic state ∈ n i  17 . In this case, the nodes can be embedded along a one dimensional niche axis and, accordingly, we denote this intrinsic state as the node's niche variable. Node sets with niche variables can be naturally ordered such that n i < n j is equivalent to i < j. That is, nodes with smaller niche variables have smaller rank. Depending on the specific context, nodes can be ordered according to any continuous niche variable that characterizes or is associated to the node, such as the size, fitness, importance, a dynamical state variable, a physical (e.g., temperature) or biological (e.g., population density) condition, or the geographic location. In this sense, ordered networks are ubiquitous in nature and, depending on the question, the same network may be ordered differently according to different niche variables.
The existence of a total ordering breaks the exchange symmetry between the network nodes and thus has important consequences for the network structure. In particular, in a directed ordered graph, any two nodes i and j are related in one of four possible ways ( Fig. 1): Two nodes may not be connected at all, there may be a unidirectional link downwards, a unidirectional link upwards (note that we always place nodes with smaller rank vertically below nodes with higher rank), or there may be a bidirectional link. This distinction between downward connections (i.e., links pointing from nodes with a higher rank to a lower rank, Fig. 1, case 2) and upward connections (Fig. 1, case 3) increases the combinatorial complexity of directed ordered networks compared to non-ordered networks and will be the base of our subsequent motif analysis.

Motif Analysis of Directed Ordered Networks.
Network motifs are small subgraphs, commonly corresponding to triplets of nodes, that are significantly overrepresented in a network 12 . To identify the motifs in a given network, the frequency of subgraphs in the network is compared to the expected number in an ensemble of randomized networks. If the frequency of a given subgraph in the network is significantly larger than the mean frequency in a randomized ensemble, the subgraph is considered a network motif.
In the following we develop an approach to extend the motif analysis to directed ordered networks. Since in a directed ordered graph any two nodes i and j are related in one of four possible ways ( Fig. 1), each triplet of nodes can be in one of 4 3 = 64 possible configurations. Ten of those configurations contain isolated nodes (i.e., nodes without any link) and will not be considered in the following. The remaining 54 connected ordered configurations, or substructures, are shown in Fig. 2. When the hierarchy among nodes is neglected, the ordered substructures collapse to the well-known 13 classes from standard motif analysis 12 . Thereby, each of the 13 classes has a different size, i.e. it contains a different number of ordered 3-node substructures ( Fig. 2). For identification, we denote each substructure by a pair of indices (q,s), where q = 1…13 is the class ID and 1 ≤ s ≤ 6 is the member ID.
In general, in directed ordered networks, downward links will have a different frequency than upward links. This generalizes to substructures containing three nodes: their likelihood of appearance depends on the number of up-and downward connections. We call the numbers of appearances, η (q,s) , of all ordered substructures in a network the spectrum of ordered 3-node substructures or motif spectrum for short. Unordered motif spectra can be retrieved by summing over member IDs The ordered motif spectrum is more fine-grained than that of unordered 3-node motifs: dissimilarities in the appearance of ordered motifs within the same (unordered) motif-class are merged and cannot be detected in standard motif analysis (Fig. 2, top row). Since the frequencies of different substructures can easily differ by orders of magnitude, we usually plot the motif spectrum on a logarithmic scale.

A Random Model for Directed Ordered Networks
We first consider a simple statistical model, the directed ordered random network, that provides a simple approach to generate statistical ensembles of ordered networks and allows to derive properties of the motif spectrum analytically. Our model is a straightforward generalization of the Erdös-Rényi random graph model 1 and depends only on two connection parameters, ↑ p and ↓ p . Let N be the number of indexed nodes (we assume that the nodes are ordered according to their rank, i = 1…N). For each The top row contains all 13 possible 3-node motifs when the node-order is not considered. Taking into account node-order yields 54 ordered motifs (shown below the top row), which are arranged below their respective isomorphism class. Thereby, each of the 13 unordered 3-node motif classes (labeled by their respective motif ID) corresponds to 1 to 6 ordered ordered 3-node motifs (labeled by their member ID). For each ordered motif, the rank of nodes increases from bottom to top. ordered pair of nodes, j < i, an upward directed link is introduced with probability ↑ p from the node j with smaller rank to the node i with larger rank, and a downward directed link is introduced with probability ↓ p . Thus, for any ordered pair of nodes the probability for the appearance of each of the four possible configurations in Fig. 1 is given by In a similar way, the probability for the appearance of each ordered substructure with three nodes (Fig. 2) can be calculated. For the substructure (1,1), for example, the appearance probability is where the first two multipliers describe the absence of links between middle and top nodes, the third and fourth multipliers correspond to the link between bottom and top nodes, and the last two multipliers encode the link between bottom and middle nodes. The expected number, η (q,s) , of appearances of each substructure with class ID q and member ID s can be calculated as where the binomial coefficient corresponds to the number of all possible ordered 3-node combinations. Figure 3a shows an exemplary motif spectrum in the special case of top-down symmetry, where = = ↑ ↓ p p p : . In this case the 54 ordered motifs can be grouped into five sets, each of which has a ↓ p 0 2. Substructures within the same motif class can have different appearance numbers, but are restricted to 13 distinct levels. specific appearance probability (see solid horizontal lines in Fig. 3a). Thereby, the probability of appearance for each substructure depends only on the number of directed links l (with 2 ≤ l ≤ 6) within the motif class 15 Since all substructures of the same motif class have the same number of directed links (Fig. 2) they also have the same appearance probability.
Next, we study the situation where the top-down symmetry is broken. Without loss of generality we assume that upward directed links are more common than downward links ( > ↑ ↓ p p ). The corresponding motif spectrum is illustrated in Fig. 3b. In contrast to the symmetric case (Fig. 3a) the substructures within a single motif class may now have different probabilities of appearance. Nevertheless, all appearance probabilities in the spectrum are grouped to 13 distinct levels (marked by solid horizontal lines). Those levels arise from permutations of multipliers in equation (3), corresponding to different substructures, but identical probabilities. We call the set of substructures with identical mean frequencies a statistical class. Note that the 13 statistical classes differ from the 13 isomorphism classes (Fig. 2, top row). In other words, members of any isomorphism class can differ widely in their rate of appearance.
Remarkably, the spectrum in Fig. 3b exhibits three dominant substructures, which correspond to the ordered motifs that contain two upward connections and no downward connection: substructure (1,3), where the node of largest rank is reached by upward links by the other two nodes; substructure (2,6), an upward chain; and substructure (4,2), where the node with the smallest rank has an upward link to each of the other nodes. As will be shown below, the same substructures play an important role in natural food webs.

Motif analysis of Empirical and Simulated Food webs
Using the Niche Model to Generate Directed Ordered Networks. In the following, we show that motif spectra of ordered networks can be used to analyze food web data. For the analysis we compare data from an empirical lake food web with statistical ensembles of directed ordered networks, which are generated by the niche model 18 . The niche model combines stochastic elements with simple link assignment rules and is well known to be able to synthesize networks of trophic interactions between species that closely resemble empirical food webs 10,18 . The model depends to two parameters, the species richness N (i.e., the number of biological species, each represented by a node in the food web) and the connectance C (i.e., the proportion of possible links in the food web that actually occur). In the niche model, each species i = 1…N is assigned a random niche variable n i ∈ [0,1], drawn from a uniform distribution. The niche variable can be regarded to be a proxy of body-size, which determines the node's incoming links, and it constitutes a natural ordering of the network nodes as described above. In the model, species are constrained to consume prey from a contiguous range of species on the niche axis 18 . That is, species i preys upon all species j that have a niche parameter n j inside a finite segment of length r i = xn i , centred at a position c i that is chosen randomly inside the interval [r i /2,n i ]. Here, 0 ≤ x ≤ 1 is a random variable from a beta distribution p(x) = β(1 − x) 1−β , with β = (1/2C) − 1. Fig. 4 we compare the motif spectrum of an empirically measured food web with that from an ensemble of computer generated ordered networks. Figure 4a shows the appearance numbers η (q,s) of all ordered 3-node substructures, which we obtained from the pelagic food web of Alford lake from the Adirondack park 21 (data kindly provided by U. Brose). The average body-mass of adult individuals was used as niche variable to order species. The data set contains N = 56 species with a connectance of C = 0.0692. As shown in the figure, only eight different substructures occur in the empirical motif spectrum. The most abundant of these are the substructures (1,3), (2,6), and (4,2). Substructure (1,3) corresponds to a motif of a predator that feeds on two prey species of smaller body-size, (2,6) is a tri-trophic chain where prey have smaller body-size than predators, and (4,2) describes a prey that is preyed upon by two predators of larger body-size. The same substructures have also been identified as the most dominant substructures in the asymmetric random model of directed ordered networks (Fig. 3b). This agreement can be explained by the fact that feeding relations between two species are not symmetric with respect to body-size. It is more likely that an upward link from a prey of smaller body-size to a larger predator occurs, than vice versa (as usual, we assume that the direction of feeding links in a food web reflects the direction of energy flow, i.e., a directed link from species A to species B means that B eats A).

Motif Spectra of Empirical and Simulated Food Webs. In
Next, we used the empirical values of N and C to generate statistical food web ensembles with the niche model. In Fig. 4a we plot the ordered motif spectra from 1000 realization of the fitted niche model. As shown in the figure, in the statistical ensemble we observe a total of 37 different substructures-clearly more than in the empirical data set. The substructures that occur in both spectra have appearance frequencies that match rather well and, in general, coincide with the substructures of higher frequencies in the statistical ensemble (Fig. 4).
To characterize deviations between the model and the empirical data, we use the Z-score  Fig. 4b, the strongest deviations between data and model (i.e., a positive Z-score) occur for the motifs (1,3), (6,3), and (5,6), all of which are connected to a pattern of omnivory (see Fig. 2). These deviations can be explained by the observation that the fitted niche model generates fewer of these motifs than observed in the field. The, by far, largest Z-score occurs for the motif (1,3), which is also the most abundant in the empirical motif spectrum (Fig. 4a). Analyzing further food-webs of the Adirondack park (results not shown) we find that this is a common pattern: typically the motif spectrum of the niche model deviates most strongly from that of the most abundant substructure in the empirical data. These results conform with the well-known observation that even though structural food web models, such as the niche model, are able to provide detailed understanding about the structural complexity of natural food webs, they still show some systematic deficiencies to predict the fine structure of complex food webs 16,19 .
To test for the relevance of body-size ordering in empirical food webs, we compare the motif spectrum of the pelagic food web of Alford lake with that of its randomly re-ordered counterparts (Fig. 5a). If a substructure does not appear in the empirical food web we set η (q,s) = 0 (not shown on the logarithmic scale in Fig. 5a). The figure reveals only slight agreement between the motif spectra of the empirical food webs before and after randomized ordering. This is confirmed by our calculation of the spectrum of Z-scores in Fig. 5b. Note, that now η ( , ) q s m and σ ( , ) q s m in Eq. (6) denote the mean frequency and standard deviation of a substructure in the randomized webs. The large entries in the spectrum of Z-scores reveal substantial deviations in the structure of the natural and randomized food webs. If the body-size order is neglected by summing over all member IDs, the spectrum reduces to the 13 standard unordered motif classes and the spectra of empirical and randomized networks become indistinguishable (Fig. 5c). These results indicate that hierarchy due to body-size is a crucial aspect of the structure of empirical food webs. The precise role of body-size order for structuring natural food webs provides an intriguing possibility for future research.

Discussion
The method presented in this paper is a natural extension of the classic network motif analysis 12 to networks with hierarchically structured nodes. For these networks, the spectrum of ordered substructures yields a quantitative description of the connectivity-patterns with respect to node-rank. Thereby, highly abundant ordered motifs, such as substructure (1,3) in Fig. 3b or (4,2) in Fig. 4, represent connectivity-patterns that are typical within the node hierarchy. The spectrum can be expressed in absolute motif counts η (q,s) or appearance probabilities P (q,s) , whereby the latter is independent of network size, which makes it suitable for unified comparisons across different networks.
We have shown that ordered motif spectra can reflect a breaking of the top-down symmetry in the hierarchy among nodes. This means that networks, in which connections to nodes of higher rank are more frequent than downward directed connections, naturally contain a larger share of the corresponding motifs that are mostly composed of upward links, or vice versa. In general, we observe large variations in the frequencies of ordered substructures that easily can span several orders of magnitudes. A high abundance of an ordered motif does, however, not directly imply its statistical significance. The latter requires a model to compare against, as demonstrated by our use of the randomly reordered networks and the niche model. On the other hand, rare motifs should not immediately be disregarded, as they might still play a key role in the network. Assume, for example, that in a food web most of the biomass is concentrated on a small number of interacting species. In this case, even if the corresponding motif(s) are very rare, they could still carry an important ecosystem function. Therefore, in general, we suggest to take the whole spectrum of ordered motifs into account for network analysis.
In this work we only considered substructures composed of three nodes, however, the presented technique generalizes to larger motifs. In practice, counting larger motifs might be more challenging due to the rapidly increasing computational demands with growing motif-and network-size. Our approach also generalizes to different classes of natural and theoretical networks. Here, we have discussed food webs, using the species' niche coordinates to define the ordering among the network nodes. But many other networks, for which motifs have been analyzed traditionally 12 , might possess natural ordering criteria or hidden niche variables, which would allow for a worthwhile reanalysis using ordered motifs. Figure 5. Motif-spectra of the pelagic food-web of Alford lake and its 1000 randomly reordered realizations (grey dots). Data can only be shown for the 6 motif classes that actually occur in the empirical food web. a Comparison of the absolute motif counts η (q,s) on a logarithmic scale. b Relative deviations between the empirical and randomized food webs, measured by the Z-score Z (q,s) . c Comparison of the number η q of unordered substructures according to equation (1) between experimental data and randomized networks on a logarithmic scale. Symbols are explained in the legend. Finally, the proposed approach may be helpful to detect structural changes in adaptive networks that are coevolutionary changing in time (e.g., evolving food webs) 30 and it might open new avenues for generalizing motif analysis to networks that contain multiple layers of connectivity 31,32 . Such structures recently have gotten in focus of the scientific attention as networks of networks or multi-layered networks and are characterized by nodes that are connected by more than one type of relationship. For example, the risk of cascading failures in important infrastructure facilities, such as electrical power grids 5 , may be related to connections within multiple interdependent communication, transport, or infrastructure subsystems 31 . Similar, in ecology, trophic interactions represent only one of many possible forms by which species can influence each other. It is increasingly recognized that ecological networks contain different interactions beyond feeding relationships, such as host-parasitoid interactions, interference competition, and other forms of non-trophic interactions (e.g., mutualism, habitat modification, or facilitation) 33,34 . The exploration of the structural and dynamical properties of such multi-layered networks is still in its infancy and in particular, simple robust techniques are needed that allow to capture the huge complexity of such systems 32 . In this sense, our proposed method may not only be helpful to identify hierarchical patterns in empirical food webs and other natural networks, it may also provide the base for extending motif analysis to multi-layered networks.