Introduction

Phylogenetic networks play a vital role in the description of the evolutionary history of species, and are especially appropriate for datasets whose evolutions contain significant amounts of reticulate events caused by recombination, hybridization, horizontal gene transfer, gene duplication, gene conversion and loss1,2,3,4,5,6,7. Even for the species which have evolved based on a tree-like model of evolution, phylogenetic networks can be used to represent conflicts in phylogenetic trees that may be caused by inadequacies of an used evolutionary model. So far, there have been many algorithms and programs for constructing phylogenetic networks. The assessment of the algorithms for constructing phylogenetic networks is mainly by means of the comparison of the networks, for example, comparing the constructed network with simulate network or actual network. In addition, comparing two phylogenetic networks can help us to understand the evolutionary history of species. Recently, researchers have shown an increased interest in definition of metrics for computing the dissimilarity between a pair of phylogenetic networks.

A measure d is called a metric on a space S if it satisfies four properties: for any a, b, cS:

  • d(a, b) ≥ 0 (nonnegative);

  • d(a, b) = 0 if and only if a = b (i.e. a and b are isomorphic) (reflexivity);

  • d(a, b) = d(b, a) (symmetry);

  • d(a, b) + d(b, c) ≥ d(a, c) (triangle inequality).

In general, it is much easier to prove a defined measure to satisfy the above-mentioned properties except the reflexivity. For a metric, if two phylogenetic networks are isomorphic, the distance between them computed by the metric is 0, otherwise it is 1; then we say that the metric is trivial. A trivial metric satisfies obviously above-mentioned properties, but it doesn’t show other information about evolutionary history implied by the two phylogenetic networks. Accordingly, in addition to these four properties, it is desired that the metric can give us some information on the dissimilarity of the evolutionary histories expressed by the phylogenetic networks being compared8,9,10,11,12,13.

Up to now, several metrics have been designed and proven that each one of them is a metric on a certain subspace of rooted phylogenetic networks, for example, μ-metric on the space of tree-sibling phylogenetic networks14, the tripartition metric on the space of tree-child phylogenetic networks15,16,17,18, the m-distance on the space of reduced phylogenetic networks19, and the d e -distance on the space of partly reduced phylogenetic networks20. The largest one among those subspace is the partly reduced phylogenetic networks, so the d e -distance is also the metric on the subspaces of tree-child phylogenetic networks, tree-sibling phylogenetic networks and reduced phylogenetic networks. The paper will introduce a new metric, denoted by kth-distance, on space of kth-order reduced phylogenetic networks (will be discussed in the following sections), and the metric is polynomial-time computable. The space of kth-order reduced phylogenetic networks is larger subspace of rooted phylogenetic networks than any one subspace on which has been defined a metric. If no special instructions, the rest of paper will use the network to denote the rooted phylogenetic network.

Preliminaries

Let \({\mathscr{X}}\) be a set of taxa. A rooted phylogenetic network N = (V, E) on \({\mathscr{X}}\) is a directed acyclic graph (DAG for short), with one root node, and its leaves labelled as \({\mathscr{X}}\) by a bijection f.

For a network N = (V, E) and a node uV, if:

  • indeg(u) = 0, then u is the root;

  • indeg(u) ≤ 1, then u is a tree node;

  • indeg(u) ≥ 2, then u is a reticulate node;

  • outdeg(u) = 0, then u is a leaf;

  • outdeg(u) ≥ 1, then u is an internal node.

Sometimes we use the notation N = ((V, E), f) to denote the network N, and V N to denote the leaf set of N. Given two nodes u, vV. If (u, v) E, then we say that v is a child of u or u is a parent of v. If there exists a directed path from u to v, then we say that v is a descendant of u or u is an ancestor of v.

The height of a node u is the length of a longest directed path beginning from u and ending with a leaf. The non-existence of cycles indicates that all nodes of N can be categorized by height: the nodes with height 0 are the leaves; for a node u with height a > 0, each child of u has height m < a and there exists at least one child with height exactly a − 1.

The depth of a node v is the length of a longest directed path beginning from the root and ending with v. In the same way, the non-existence of cycles indicates that all nodes of N can be categorized by depth: the only node with depth 0 is the root; for a node v with depth b > 0, each parent of v has depth m < b and there exists at least one parent with depth exactly b − 1.

Definition 1. For two networks N 1 = ((V 1, E 1), f 1) and N 2 = ((V 2, E 2), f 2), they are isomorphic if and only if there exists a bijection H from V 1 to V 2 such that:

  • (u, v) is an edge in E 1 if and only if (H(u), H(v)) is an edge in E 2;

  • for each leaf wV 1, f 1(w) = f 2(H(w)).

Although the subspace defined by the d e -distance is the largest one among all defined subspaces, there exist a large number of networks that aren’t measured by the d e -distance. For example, the two networks in Fig. 1 (from the paper20) are not isomorphic, while the d e -distance between them is 0. Even for two non-isomorphic networks whose d e -distance is not 0, the distance is usually maximal value 1. For example the networks in Fig. 2, there is a certain resemblance between them, so it is desired that the distance between them is less than 1. However, their d e -distance is maximal value 1. On the other hand, for any two networks N 1 on \({{\mathscr{X}}}_{1}\) and N 2 on \({{\mathscr{X}}}_{2}\), the d e -distance between them is 1 as long as \({{\mathscr{X}}}_{1}\ne {{\mathscr{X}}}_{2}\). When \({{\mathscr{X}}}_{1}\subset {{\mathscr{X}}}_{2}\), the two compared networks may share some information (see Fig. 3).

Figure 1
figure 1

N 1 and N 2 are not isomorphic.

Figure 2
figure 2

N 1 and N 2 on \({\mathscr{X}}=\{1,2,3,4,5,6\}\) are not isomorphic.

Figure 3
figure 3

N 1 is on the \({{\mathscr{X}}}_{1}=\{1,2,3,4,5\}\); N 2 is on the \({{\mathscr{X}}}_{2}=\{1,2,3,4,5,6\}\).

Methods

Let N = ((V, E), f) be a network. Now we begin to give several definitions for the same network.

Definition 2. Two nodes u, vV (not necessarily different) are called first-order equivalent, denoted by u ≡ 1 v, if

  • u, vV N and f(u) = f(v), or

  • node u has l(≥1) children \({u}_{1},{u}_{2},\cdots ,{u}_{l}\), node v has l children \({v}_{1},{v}_{2},\cdots ,{v}_{l}\), and u i  ≡ 1 v i for 1 ≤ i ≤ l.

Example 1. Consider the network N 1 in Fig. 1. Each node of N 1 is first-order equivalent with itself, and C ≡ 1E, D ≡ 1F, H ≡ 1J.

Definition 3. Given an even number k ≥ 2. Two nodes u, vV (not necessarily different) are called kth-order equivalent, denoted by u ≡ k v, if u ≡ k−1 v, and:

  • u, v are the root, or

  • node u has l(≥1) parents \({u}_{1},{u}_{2},\cdots ,{u}_{l}\), node v has l parents \({v}_{1},{v}_{2},\cdots ,{v}_{l}\), and u i  ≡ k v i for 1 ≤ i ≤ l.

Definition 4. Given an odd number k ≥ 2. Two nodes u, vV (not necessarily different) are called kth-order equivalent, denoted by u ≡ k v, if u ≡ k−1 v, and:

  • u, vV N , and f(u) = f(v), or

  • node u has l(≥1) children \({u}_{1},{u}_{2},\cdots ,{u}_{l}\), node v has l children \({v}_{1},{v}_{2},\cdots ,{v}_{l}\), and u i  ≡ k v i for 1 ≤ i ≤ l.

Example 2. Consider the network N 1 in Fig. 1 again. Each node of N 1 is second-order equivalent with itself, and H ≡ 2J. Each node of N 1 is only kth-order equivalent with itself (k ≥ 3).

Lemma 1. Here k is an odd number. Given nodes u 1, u 2, \(\cdots \), u s in a network, if each u i has l children, and each child of u i is only kth-order equivalent with itself (1 ≤ i ≤ s). Then u 1 ≡ k u 2 ≡ k \(\cdots \)  ≡ k u s if and only if u 1, u 2, \(\cdots \), u s have the same children (refer to the Fig. 4 ).

Figure 4
figure 4

The topology relation of odd-order equivalent nodes.

Lemma 2. Here k is an even number. Given nodes v 1, v 2, \(\cdots \), v s in a network, if each v i has l parents, and each parent of v i is only kth-order equivalent with itself. Then v 1 ≡ k v 2 ≡ k \(\cdots \) ≡ k v s if and only if v 1, v 2, \(\cdots \), v s have the same parents (refer to the Fig. 5 ).

Figure 5
figure 5

The topology relation of even-order equivalent nodes.

Lemma 3. For all leaves, the root and the nodes with height 1 in a network, each of them is kth-order equivalent with itself (for any k).

The proofs of Lemmas 1, 2 and 3 aren’t listed here. It can be concluded from these definitions that each kth-order equivalence is an equivalence relation, i.e. it is transitive, reflexive and symmetric. It can be easily proved that all the first-order equivalent nodes have the same height and all the kth-order equivalent nodes (k ≥ 2) have the same height and depth (refer to the literature20).

If a node u is kth-order equivalent with other nodes except itself, we say that u has non-trivial kth-order equivalent nodes. For a network, after deleting the non-trivial kth-order equivalent nodes of each node, as well as the nodes with indegree 1 and outdegree 1, the resulting network is called the kth-order reduced phylogenetic network. All the kth-order reduced phylogenetic networks form the space of kth-order reduced phylogenetic network. So a network N is in the space of kth-order reduced phylogenetic networks, if and only if each node of N is only kth-order equivalent with itself.

The space of first-order reduced phylogenetic networks is the space of reduced phylogenetic networks defined in the paper19. The space of second-order reduced phylogenetic networks is the space of partly reduced phylogenetic networks defined in the paper20. Figure 6 shows the relationship of these subspaces.

Figure 6
figure 6

A is the space of rooted phylogenetic networks; B is the space of kth-order reduced phylogenetic networks (k ≥ 2); C is the space of partly reduced phylogenetic networks; and D is the space of reduced phylogenetic networks.

The space of kth-order reduced phylogenetic networks is not equals to the space of rooted phylogenetic network. For example the network N in Fig. 7, for any k, each node of N is kth-order equivalent with itself, and A ≡ kB. So N isn’t the kth-order reduced phylogenetic network, i.e. not in the space of kth-order reduced phylogenetic networks.

Figure 7
figure 7

N is a rooted phylogenetic network.

In order to compute the dissimilarity of the networks, we will extend the above concepts defined in a network to two networks in the following sections. Let N 1 = ((V 1, E 1), f 1) and N 2 = ((V 2, E 2), f 2) be two networks.

Definition 5. Two nodes uV 1, vV 2 are called first-order equivalent, denoted by u ≡ 1 v, if

  • \(u\in {V}_{{N}_{1}},v\in {V}_{{N}_{2}}\), and f 1(u) = f 2(v), or

  • node u has l(≥1) children u 1, u 2, \(\cdots \), u l , node v has l children v 1, v 2, \(\cdots \), v l , and u i  ≡ 1 v i for 1 ≤ i ≤ l.

Definition 6. Given an even number k ≥ 2. Two nodes uV 1, vV 2 are called kth-order equivalent, denoted by u ≡ k v, if u ≡ k−1 v, and:

  • u, v are the root, or

  • node u has l(≥1) parents u 1, u 2, \(\cdots \), u l , node v has l parents v 1, v 2, \(\cdots \), v l , and u i  ≡ k v i for 1 ≤ i ≤ l.

Definition 7. Given an odd number k ≥ 2. Two nodes uV 1, vV 2 are called kth-order equivalent, denoted by u ≡ k v, if u ≡ k−1 v, and:

  • \(u\in {V}_{{N}_{1}},v\in {V}_{{N}_{2}}\) and f 1(u) = f 2(v), or

  • node u has l(≥1) children u 1, u 2, \(\cdots \), u l , node v has l children v 1, v 2, \(\cdots \), v l , and u i  ≡ k v i for 1 ≤ i ≤ l.

Let u, u 0 be two nodes from two networks or the same network. From these definitions, it follows that if there exists a positive integer k 1, such that u\({}^{{k}_{1}}{u}_{0}\), then for any k > k 1, uk u 0. Given two networks N 1 = (V 1, E 1) and N 2 = (V 2, E 2). We use the following processes to compute the kth-order unique nodes of N 1, denoted by L k(N 1). First L k(N 1) = . Then for each node uV 1, if there has no node u 0L k(N 1) such that u ≡ k u 0, add u to L k(N 1). Similarly, we can compute L k(N 2). For each node uL k(N 1), \({e}_{{N}_{1}}^{k}(u)\) denotes the number of nodes which are kth-order equivalent with u, i.e. \({e}_{{N}_{1}}^{k}(u)=|\{v\in {V}_{1}:v{\equiv }^{k}u\}|\). Similarly, we can define \({e}_{{N}_{2}}^{k}(u)\) for each node uL k(N 2). For the sake of simplicity, we drop the subscript of e. Here e k() = 0.

Lemma 4. Given two networks N 1 = (V 1, E 1) and N 2 = (V 2, E 2). For u 1, u 2V 1 , v 1, v 2V 2 , and u 1 ≡ k v 1, u 2 ≡ k v 2 . Then, u 1 ≡ k u 2 if and only if v 1 ≡ k v 2.

Proof. Refer to the proof of the Theorem 15 in the paper20.

A Metric

Definition 8. For two networks N 1 = (V 1, E 1) and N 2 = (V 2, E 2), the kth-distance d k (N 1, N 2) equals

$$\frac{1}{k({n}_{1}+{n}_{2})}\{\sum _{i=1}^{k}[\sum _{v\in {L}^{i}({N}_{1})}max\{0,{e}^{i}(v)-{e}^{i}(v^{\prime} )\}+\sum _{u\in {L}^{i}({N}_{2})}max\{0,{e}^{i}(u)-{e}^{i}(u^{\prime} )\}]\}$$
(1)

where v′ (or u′) is a node in L i(N 2) (or L i(N 1)) that is ith-order equivalent to v (or u), and if no such node exists, then v′ =  (or u′ = ). n 1 and n 2 are the number of nodes in N 1 and N 2 respectively.

For each i (1 ≤ i ≤ k), the maximal value of \({\sum }_{v\in {L}^{i}({N}_{1})}max\{\mathrm{0,}\,{e}^{i}(v)-{e}^{i}(v^{\prime} )\}+{\sum }_{u\in {L}^{i}({N}_{2})}max\{\mathrm{0,}\,{e}^{i}(u)-{e}^{i}(u^{\prime} )\}\) is n 1 + n 2, so the formulate 1 has maximal value 1 and minimal value 0. For a give i (1 ≤ i ≤ k), if the value of \({\sum }_{v\in {L}^{i}({N}_{1})}max\{\mathrm{0,}\,{e}^{i}(v)-{e}^{i}(v^{\prime} )\}+{\sum }_{u\in {L}^{i}({N}_{2})}max\{\mathrm{0,}\,{e}^{i}(u)-{e}^{i}(u^{\prime} )\}\) is d, then for any j (i + 1 ≤ j ≤ k), the value of \({\sum }_{v\in {L}^{j}({N}_{1})}max\{\mathrm{0,}\,{e}^{j}(v)-{e}^{j}(v^{\prime} )\}+{\sum }_{u\in {L}^{j}({N}_{2})}max\{\mathrm{0,}\,{e}^{j}(u)-{e}^{j}(u^{\prime} )\}\) is more than d.

From the definition 8, it follows that the 1st-distance is the m-distance defined in the space of reduced phylogenetic networks, and the 2nd-distance is the d e -distance defined in the space of partly reduced phylogenetic networks.

Lemma 5. If d k (N 1, N 2) = 0. Then |V 1| = |V 2|, and there exists a node v 0L i(V 2) for each node vL i(V 1), such that v 0 ≡ i v and e i(v 0) = e i(v) (1 ≤ i ≤ k).

Proof. From d k (N 1, N 2) = 0, it follows that \({\sum }_{v\in {L}^{i}({N}_{1})}max\{0,{e}^{i}(v)-{e}^{i}(v^{\prime} )\}=0\) and \({\sum }_{u\in {L}^{i}({N}_{2})}max\{\mathrm{0,}{e}^{i}(u)-{e}^{i}(u^{\prime} )\}=0\) (1 ≤ i ≤ k). So max{0, e i(v) − e i(v′)} = 0 for each node vL i(N 1). Suppose that there exists a node vL i(N 1) such that e i(v) − e i(v′) < 0, then e i(v′) − e i(v) > 0. So \({\sum }_{u\in {L}^{i}({N}_{2})}max\{0,{e}^{i}(u)-{e}^{i}(u^{\prime} )\} > 0\). It contradict \({\sum }_{u\in {L}^{i}({N}_{2})}max\{0,{e}^{i}(u)-{e}^{i}(u^{\prime} )\}=0\). Therefore, for each node vL i(N 1), we have e i(v) − e i(v′) = 0, i.e. e i(v) = e i(v′). Similarly, for each node uL i(N 2), e i(u) = e i(u′). Accordingly, |V 1| = |V 2|.

Lemma 6. Given two kth-order reduced phylogenetic networks N 1 = (V 1, E 1) and N 2 = (V 2, E 2). Then d k (N 1, N 2) = 0 if and only if N 1 and N 2 are isomorphic.

Proof. If N 1 and N 2 are isomorphic, obviously d k (N 1, N 2) = 0. The converse conclusion will be proven as follows.

Lemma 5 tells us that |V 1| = |V 2|. From the property of the kth-order reduced phylogenetic networks, it follows that each node u in V 1 is just kth-order equivalent with itself and uL k(V 1). Similarly, each node v in V 2 is just kth-order equivalent with itself and vL k(V 2). Moreover, for each node uV 1, there exists the only one node vV 2 such that u ≡ k v. So we define a mapping H from V 1 to V 2, for each node uV 1, H(u) = u′, where u′ V 2 and u′ ≡ k u.

First we prove that the mapping H is a bijection. For any two different nodes u 1, u 2V 1, there exist two nodes \({u}_{1}^{^{\prime} },{u}_{2}^{^{\prime} }\in {V}_{2}\), such that \(H({u}_{1})={u}_{1}^{^{\prime} }\) and \(H({u}_{2})={u}_{2}^{^{\prime} }\). Here \({u}_{1}^{^{\prime} }\) and \({u}_{2}^{^{\prime} }\) are not the same nodes. If not, then u 1 ≡ k u 2. It contradict that each node uV 1 is just kth-order equivalent with itself. So H is injective. Due to |V 1| = |V 2|, we have that H is a surjection.

Then we prove that if (u, v) E 1, then (H(u), H(v)) E 2. Let u 0 = H(u) and v 0 = H(v), i.e. u 0 ≡ k u and v 0 ≡ k v. If k is an odd number, then the children of u are kth-order equivalent with the children of u 0 respectively. Thus, v is kth-order equivalent with a child v′ of u 0, i.e. v′ ≡ k v ≡ k v 0. Since every node is only kth-order equivalent with itself, v′ and v 0 are the same nodes, i.e. v 0 is a child of u 0. Therefore, (u 0, v 0) E 2. Similarly, we can come to the conclusion when k is an even number.

The mapping H also preserves the labels of the leaves from the definition of kth-order equivalence. In conclusion, N 1 and N 2 are isomorphic.

Lemma 7. For any one pair of networks N 1 and N 2, d k (N 1, N 2) = d k (N 2, N 1).

The distance d k (N 1, N 2) can be viewed as the symmetric difference of the same set of elements \({\cup }_{i=1}^{k}\{{L}^{i}({N}_{1})\cup {L}^{i}({N}_{2})\}\). From the property of the symmetric difference21, it follows that the following triangle inequality holds:

Lemma 8. For any three networks N 1 , N 2 and N 3 , d k (N 1, N 2) + d k (N 2, N 3) ≥ d k (N 1, N 3).

From Lemmas 6, 7 and 8, we have the following result:

Theorem 9 The kth-distance defined by the formula 1 is a metric on the space of kth-order reduced phylogenetic networks.

Let k = 3 and n j the number of nodes of network N j (j = 1, 2). Consider the two networks in Fig. 1. For i = 1 and 2, \({\sum }_{v\in {L}^{i}({N}_{1})}max\{0,{e}^{i}(v)-{e}^{i}(v^{\prime} )\}+{\sum }_{u\in {L}^{i}({N}_{2})}max\{0,{e}^{i}(u)-{e}^{i}(u^{\prime} )\}=0\). For i = 3, \({\sum }_{v\in {L}^{i}({N}_{1})}max\{0,{e}^{i}(v)-{e}^{i}(v^{\prime} )\}+{\sum }_{u\in {L}^{i}({N}_{2})}max\{0,{e}^{i}(u)-{e}^{i}(u^{\prime} )\}={n}_{1}+{n}_{2}\). So the d(N 1, N 2) = 1/3.

Consider two networks in Fig. 2. The nodes R, B, E, F, K in V 1 don’t exist first-order equivalent nodes in V 2, while the nodes R, B, F in V 2 don’t exist first-order equivalent nodes in V 1. Everyone else has only one first-order equivalent node. So \({\sum }_{v\in {L}^{1}({N}_{1})}max\{0,{e}^{1}(v)-{e}^{1}(v^{\prime} )\}+{\sum }_{u\in {L}^{1}({N}_{2})}max\{0,{e}^{1}(u)-{e}^{1}(u^{\prime} )\}=8\). For i = 2 and 3, every node in V 1 doesn’t exist ith-order equivalent nodes in V 2. So \({\sum }_{v\in {L}^{i}({N}_{1})}max\{0,{e}^{i}(v)-{e}^{i}(v^{\prime} )\}+{\sum }_{u\in {L}^{i}({N}_{2})}max\{0,{e}^{i}(u)-{e}^{i}(u^{\prime} )\}\) \(={n}_{1}+{n}_{2}=13+15=28\). Accordingly d(N 1, N 2) = (8 + 28 + 28)/(3 × 28) = 16/21.

Consider two networks in Fig. 3. The nodes R, B, F in V 1 don’t exist first-order equivalent nodes in V 2, and the nodes R, B, F, H, 6 in V 2 don’t exist first-order equivalent nodes in V 1. Everyone else has only one first-order equivalent with node. So \({\sum }_{v\in {L}^{1}({N}_{1})}max\{0,{e}^{1}(v)-{e}^{1}(v^{\prime} )\}+{\sum }_{u\in {L}^{1}({N}_{2})}max\{0,{e}^{1}(u)-{e}^{1}(u^{\prime} )\}=8\). For i = 2 and 3, every node in V 1 doesn’t exist ith-order equivalent nodes in V 2. So \({\sum }_{v\in {L}^{i}({N}_{1})}max\{0,{e}^{i}(v)-{e}^{i}(v^{\prime} )\}+{\sum }_{u\in {L}^{i}({N}_{2})}max\{0,{e}^{i}(u)-{e}^{i}(u^{\prime} )\}\) \(={n}_{1}+{n}_{2}=13+15=28\). Accordingly d(N 1, N 2) = (8 + 28 + 28)/(3 × 28) = 16/21.

Lemma 10. If there is d k (N 1, N 2) = 0 for all k. Then there exists a positive integer m, such that for any m 0 ≥ m, we have that each node u in V 1 has a m 0 th-order equivalent node uin V 2.

Proof. Assume that the above conclusion does not hold, i.e. for any positive integer m, there exist k 0 ≥ m and a node uV 1, such that u′ \({}^{{k}_{0}}u\) for any node u′ V 2. So when m = 1, there exist k 1 and u 1V 1, such that u 1\({}^{{k}_{1}}u^{\prime} \) for any node u′ V 2. So \({d}_{{k}_{1}}({N}_{1},{N}_{2})\ne 0\). This conclusion is in contradiction with d k (N 1, N 2) = 0 for all k.   

Computational Aspects

For odd number k (or even number k), the kth-order equivalent nodes can be computed by a bottom-up (or top-down) approach, no matter whether the nodes are in the same network or two different networks. Given two networks N 1 = ((V 1, E 1), f 1) and N 2 = ((V 2, E 2), f 2). Algorithm 8 shows the pseudocode that decides whether two nodes are kth-order equivalent or not, where E(k) is the abbreviation for the set of kth-order equivalent nodes. This process will cost at most O(n 3) time, where n = max(|V 1|, |V 2|). Therefore, it takes totally at most O(n 5) time to find out all ith-order (where 1 ≤ i ≤ k) equivalent nodes for each node of the two networks. Computing the formula 1 will costs O(n) time. In conclusion, we will spend O(n 5) time in computing the kth-distance between two networks, where n is the maximum of |V 1| and |V 2|.

Results

We compared the kth-distance with m-distance on the space of reduced phylogenetic networks19 and the d e -distance on the space of partly reduced phylogenetic networks20, by means of 100 networks constructed by the Lnetwork method3

Algorithm 1: Deciding whether u and v are kth-order equivalent or not for an odd number k (or an even number k).

 1: input: nodes u and v

 2: if outdegree of u is not equals to that of v (or indegree of u is not equals to that of v) then

 3:    return

 4: end if

 5: if u and v are leaves and they have the same labels (or u and v are the root) then

 6:    add v to E(k) of u

 7:    add u to E(k) of v

 8: else

 9:    flag := false

10:   if E(k − 1) of u does’t contain v then

11:     return

12:   end if

13:   for each child a of u (or each parent a of u) do

14:    for each child b of v (or each parent b of v) do

15:     if b.label = true then

16:         continue

17:     end if

18:     if the E(k) of a has b then

19:         flag = true

20:         b.label = true

21:     end if

22:    end for

23:    if flag = false then

24:       return

25:    else

26:       flag = false

27:    end if

28:    end for

29:    add v to E(k) of u

30:    add u to E(k) of v

31: end if

. Thus, each distance method can obtain a distance matrix with approximately 5000 values. Figure 8 shows the distribution of the distance values, where the horizontal axis is the distance value and the vertical axis is the percent of the distance value in all values. Here the results of d e -distance didn’t show in Fig. 8, because it just has two distance values 1 and 0, and 99.38 percent and 0.62 percent respectively. The minima of m-distance and the d e -distance are 0, while the minimum of kth-distance is 0.32.

Figure 8
figure 8

The results of m-distance and kth-distance.

From the results, we reached the following conclusions. First, almost all d e -distance values are maximum values 1. Second, the kth-distance values are not 0 between the networks whose d e -distance and the m-distance values are 0. Third, the kth-distance values are larger than the m-distance values for the same networks.

Discussion

In order to compare dissimilarity for more phylogenetic networks, we define a polynomial-time computable metric on the space of kth-order reduced phylogenetic networks. Here the larger k is, the larger the space of kth-order reduced phylogenetic networks is. Moreover, the larger k is, the more precise the distance between two phylogenetic networks is. Take the non-isomorphism networks in Fig. 1 for example. When k = 1 or 2, the value computed by the formula 1 is 0, i.e. their m-distance and d e -distance are 0. However, when k = 3, the value computed by the formula 1 is 1/3. So when k = 1 or 2, the value computed by the formula 1 doesn’t indicate the real dissimilarity between the two networks. The choose of k in general is based on the desired precision of distance. Whatever k is, the kth-distance is not a metric on the space of all rooted phylogenetic networks. For example, the two phylogenetic networks in Fig. 9, their kth-distance is 0, but they are not isomorphic.

Figure 9
figure 9

Two networks are not isomorphic.