A Metric on the Space of kth-order reduced Phylogenetic Networks

Wang, Juan; Guo, Maozu

doi:10.1038/s41598-017-03363-y

Download PDF

Article
Open access
Published: 09 June 2017

A Metric on the Space of kth-order reduced Phylogenetic Networks

Juan Wang¹ &
Maozu Guo²

Scientific Reports volume 7, Article number: 3189 (2017) Cite this article

725 Accesses
Metrics details

Subjects

Abstract

Phylogenetic networks can be used to describe the evolutionary history of species which experience a certain number of reticulate events, and represent conflicts in phylogenetic trees that may be due to inadequacies of the evolutionary model used in the construction of the trees. Measuring the dissimilarity between two phylogenetic networks is at the heart of our understanding of the evolutionary history of species. This paper proposes a new metric, i.e. kth-distance, for the space of kth-order reduced phylogenetic networks that can be calculated in polynomial time in the size of the compared networks.

A vectorial tree distance measure

Article Open access 28 March 2022

Avner Priel & Boaz Tamir

Combinatorial characterization of a certain class of words and a conjectured connection with general subclasses of phylogenetic tree-child networks

Article Open access 08 November 2021

Miquel Pons & Josep Batle

Generation of accurate, expandable phylogenomic trees with uDance

Article 27 July 2023

Metin Balaban, Yueyu Jiang, … Siavash Mirarab

Introduction

Phylogenetic networks play a vital role in the description of the evolutionary history of species, and are especially appropriate for datasets whose evolutions contain significant amounts of reticulate events caused by recombination, hybridization, horizontal gene transfer, gene duplication, gene conversion and loss^{1,2,3,4,5,6,7}. Even for the species which have evolved based on a tree-like model of evolution, phylogenetic networks can be used to represent conflicts in phylogenetic trees that may be caused by inadequacies of an used evolutionary model. So far, there have been many algorithms and programs for constructing phylogenetic networks. The assessment of the algorithms for constructing phylogenetic networks is mainly by means of the comparison of the networks, for example, comparing the constructed network with simulate network or actual network. In addition, comparing two phylogenetic networks can help us to understand the evolutionary history of species. Recently, researchers have shown an increased interest in definition of metrics for computing the dissimilarity between a pair of phylogenetic networks.

A measure d is called a metric on a space S if it satisfies four properties: for any a, b, c ∈ S:

d(a, b) ≥ 0 (nonnegative);
d(a, b) = 0 if and only if a = b (i.e. a and b are isomorphic) (reflexivity);
d(a, b) = d(b, a) (symmetry);
d(a, b) + d(b, c) ≥ d(a, c) (triangle inequality).

In general, it is much easier to prove a defined measure to satisfy the above-mentioned properties except the reflexivity. For a metric, if two phylogenetic networks are isomorphic, the distance between them computed by the metric is 0, otherwise it is 1; then we say that the metric is trivial. A trivial metric satisfies obviously above-mentioned properties, but it doesn’t show other information about evolutionary history implied by the two phylogenetic networks. Accordingly, in addition to these four properties, it is desired that the metric can give us some information on the dissimilarity of the evolutionary histories expressed by the phylogenetic networks being compared^{8,9,10,11,12,13}.

Up to now, several metrics have been designed and proven that each one of them is a metric on a certain subspace of rooted phylogenetic networks, for example, μ-metric on the space of tree-sibling phylogenetic networks¹⁴, the tripartition metric on the space of tree-child phylogenetic networks^15,16,17,18, the m-distance on the space of reduced phylogenetic networks¹⁹, and the d _e-distance on the space of partly reduced phylogenetic networks²⁰. The largest one among those subspace is the partly reduced phylogenetic networks, so the d _e-distance is also the metric on the subspaces of tree-child phylogenetic networks, tree-sibling phylogenetic networks and reduced phylogenetic networks. The paper will introduce a new metric, denoted by kth-distance, on space of kth-order reduced phylogenetic networks (will be discussed in the following sections), and the metric is polynomial-time computable. The space of kth-order reduced phylogenetic networks is larger subspace of rooted phylogenetic networks than any one subspace on which has been defined a metric. If no special instructions, the rest of paper will use the network to denote the rooted phylogenetic network.

Preliminaries

Let ${\mathscr{X}}$ be a set of taxa. A rooted phylogenetic network N = (V, E) on ${\mathscr{X}}$ is a directed acyclic graph (DAG for short), with one root node, and its leaves labelled as ${\mathscr{X}}$ by a bijection f.

For a network N = (V, E) and a node u ∈ V, if:

indeg(u) = 0, then u is the root;
indeg(u) ≤ 1, then u is a tree node;
indeg(u) ≥ 2, then u is a reticulate node;
outdeg(u) = 0, then u is a leaf;
outdeg(u) ≥ 1, then u is an internal node.

Sometimes we use the notation N = ((V, E), f) to denote the network N, and V _N to denote the leaf set of N. Given two nodes u, v ∈ V. If (u, v) ∈ E, then we say that v is a child of u or u is a parent of v. If there exists a directed path from u to v, then we say that v is a descendant of u or u is an ancestor of v.

The height of a node u is the length of a longest directed path beginning from u and ending with a leaf. The non-existence of cycles indicates that all nodes of N can be categorized by height: the nodes with height 0 are the leaves; for a node u with height a > 0, each child of u has height m < a and there exists at least one child with height exactly a − 1.

The depth of a node v is the length of a longest directed path beginning from the root and ending with v. In the same way, the non-existence of cycles indicates that all nodes of N can be categorized by depth: the only node with depth 0 is the root; for a node v with depth b > 0, each parent of v has depth m < b and there exists at least one parent with depth exactly b − 1.

Definition 1. For two networks N ₁ = ((V ₁, E ₁), f ₁) and N ₂ = ((V ₂, E ₂), f ₂), they are isomorphic if and only if there exists a bijection H from V ₁ to V ₂ such that:

(u, v) is an edge in E ₁ if and only if (H(u), H(v)) is an edge in E ₂;
for each leaf w ∈ V ₁, f ₁(w) = f ₂(H(w)).

Although the subspace defined by the d _e-distance is the largest one among all defined subspaces, there exist a large number of networks that aren’t measured by the d _e-distance. For example, the two networks in Fig. 1 (from the paper²⁰) are not isomorphic, while the d _e-distance between them is 0. Even for two non-isomorphic networks whose d _e-distance is not 0, the distance is usually maximal value 1. For example the networks in Fig. 2, there is a certain resemblance between them, so it is desired that the distance between them is less than 1. However, their d _e-distance is maximal value 1. On the other hand, for any two networks N ₁ on ${{\mathscr{X}}}_{1}$ and N ₂ on ${{\mathscr{X}}}_{2}$, the d _e-distance between them is 1 as long as ${{\mathscr{X}}}_{1}\ne {{\mathscr{X}}}_{2}$. When ${{\mathscr{X}}}_{1}\subset {{\mathscr{X}}}_{2}$, the two compared networks may share some information (see Fig. 3).

Methods

Let N = ((V, E), f) be a network. Now we begin to give several definitions for the same network.

Definition 2. Two nodes u, v ∈ V (not necessarily different) are called first-order equivalent, denoted by u ≡ ¹ v, if

u, v ∈ V _N and f(u) = f(v), or
node u has l(≥1) children ${u}_{1},{u}_{2},\cdots ,{u}_{l}$, node v has l children ${v}_{1},{v}_{2},\cdots ,{v}_{l}$, and u _i ≡ ¹ v _i for 1 ≤ i ≤ l.

Example 1. Consider the network N ₁ in Fig. 1. Each node of N ₁ is first-order equivalent with itself, and C ≡ ¹E, D ≡ ¹F, H ≡ ¹J.

Definition 3. Given an even number k ≥ 2. Two nodes u, v ∈ V (not necessarily different) are called kth-order equivalent, denoted by u ≡ ^k v, if u ≡ ^k−1 v, and:

u, v are the root, or
node u has l(≥1) parents ${u}_{1},{u}_{2},\cdots ,{u}_{l}$, node v has l parents ${v}_{1},{v}_{2},\cdots ,{v}_{l}$, and u _i ≡ ^k v _i for 1 ≤ i ≤ l.

Definition 4. Given an odd number k ≥ 2. Two nodes u, v ∈ V (not necessarily different) are called kth-order equivalent, denoted by u ≡ ^k v, if u ≡ ^k−1 v, and:

u, v ∈ V _N, and f(u) = f(v), or
node u has l(≥1) children ${u}_{1},{u}_{2},\cdots ,{u}_{l}$, node v has l children ${v}_{1},{v}_{2},\cdots ,{v}_{l}$, and u _i ≡ ^k v _i for 1 ≤ i ≤ l.

Example 2. Consider the network N ₁ in Fig. 1 again. Each node of N ₁ is second-order equivalent with itself, and H ≡ ²J. Each node of N ₁ is only kth-order equivalent with itself (k ≥ 3).

Lemma 1. Here k is an odd number. Given nodes u ₁, u ₂, $\cdots $, u _s in a network, if each u _i has l children, and each child of u _i is only kth-order equivalent with itself (1 ≤ i ≤ s). Then u ₁ ≡ ^k u ₂ ≡ ^k $\cdots $ ≡ ^k u _s if and only if u ₁, u ₂, $\cdots $, u _s have the same children (refer to the Fig. 4 ).

Lemma 2. Here k is an even number. Given nodes v ₁, v ₂, $\cdots $, v _s in a network, if each v _i has l parents, and each parent of v _i is only kth-order equivalent with itself. Then v ₁ ≡ ^k v ₂ ≡ ^k $\cdots $ ≡ ^k v _s if and only if v ₁, v ₂, $\cdots $, v _s have the same parents (refer to the Fig. 5 ).

Lemma 3. For all leaves, the root and the nodes with height 1 in a network, each of them is kth-order equivalent with itself (for any k).

The proofs of Lemmas 1, 2 and 3 aren’t listed here. It can be concluded from these definitions that each kth-order equivalence is an equivalence relation, i.e. it is transitive, reflexive and symmetric. It can be easily proved that all the first-order equivalent nodes have the same height and all the kth-order equivalent nodes (k ≥ 2) have the same height and depth (refer to the literature²⁰).

If a node u is kth-order equivalent with other nodes except itself, we say that u has non-trivial kth-order equivalent nodes. For a network, after deleting the non-trivial kth-order equivalent nodes of each node, as well as the nodes with indegree 1 and outdegree 1, the resulting network is called the kth-order reduced phylogenetic network. All the kth-order reduced phylogenetic networks form the space of kth-order reduced phylogenetic network. So a network N is in the space of kth-order reduced phylogenetic networks, if and only if each node of N is only kth-order equivalent with itself.

The space of first-order reduced phylogenetic networks is the space of reduced phylogenetic networks defined in the paper¹⁹. The space of second-order reduced phylogenetic networks is the space of partly reduced phylogenetic networks defined in the paper²⁰. Figure 6 shows the relationship of these subspaces.

The space of kth-order reduced phylogenetic networks is not equals to the space of rooted phylogenetic network. For example the network N in Fig. 7, for any k, each node of N is kth-order equivalent with itself, and A ≡ ^kB. So N isn’t the kth-order reduced phylogenetic network, i.e. not in the space of kth-order reduced phylogenetic networks.

In order to compute the dissimilarity of the networks, we will extend the above concepts defined in a network to two networks in the following sections. Let N ₁ = ((V ₁, E ₁), f ₁) and N ₂ = ((V ₂, E ₂), f ₂) be two networks.

Definition 5. Two nodes u ∈ V ₁, v ∈ V ₂ are called first-order equivalent, denoted by u ≡ ¹ v, if

$u\in {V}_{{N}_{1}},v\in {V}_{{N}_{2}}$, and f ₁(u) = f ₂(v), or
node u has l(≥1) children u ₁, u ₂, $\cdots $, u _l, node v has l children v ₁, v ₂, $\cdots $, v _l, and u _i ≡ ¹ v _i for 1 ≤ i ≤ l.

Definition 6. Given an even number k ≥ 2. Two nodes u ∈ V ₁, v ∈ V ₂ are called kth-order equivalent, denoted by u ≡ ^k v, if u ≡ ^k−1 v, and:

u, v are the root, or
node u has l(≥1) parents u ₁, u ₂, $\cdots $, u _l, node v has l parents v ₁, v ₂, $\cdots $, v _l, and u _i ≡ ^k v _i for 1 ≤ i ≤ l.

Definition 7. Given an odd number k ≥ 2. Two nodes u ∈ V ₁, v ∈ V ₂ are called kth-order equivalent, denoted by u ≡ ^k v, if u ≡ ^k−1 v, and:

$u\in {V}_{{N}_{1}},v\in {V}_{{N}_{2}}$ and f ₁(u) = f ₂(v), or
node u has l(≥1) children u ₁, u ₂, $\cdots $, u _l, node v has l children v ₁, v ₂, $\cdots $, v _l, and u _i ≡ ^k v _i for 1 ≤ i ≤ l.

Let u, u ₀ be two nodes from two networks or the same network. From these definitions, it follows that if there exists a positive integer k ₁, such that u ≢ ${}^{{k}_{1}}{u}_{0}$, then for any k > k ₁, u ≢ ^k u ₀. Given two networks N ₁ = (V ₁, E ₁) and N ₂ = (V ₂, E ₂). We use the following processes to compute the kth-order unique nodes of N ₁, denoted by L ^k(N ₁). First L ^k(N ₁) = ∅. Then for each node u ∈ V ₁, if there has no node u ₀ ∈ L ^k(N ₁) such that u ≡ ^k u ₀, add u to L ^k(N ₁). Similarly, we can compute L ^k(N ₂). For each node u ∈ L ^k(N ₁), ${e}_{{N}_{1}}^{k}(u)$ denotes the number of nodes which are kth-order equivalent with u, i.e. ${e}_{{N}_{1}}^{k}(u)=|\{v\in {V}_{1}:v{\equiv }^{k}u\}|$. Similarly, we can define ${e}_{{N}_{2}}^{k}(u)$ for each node u ∈ L ^k(N ₂). For the sake of simplicity, we drop the subscript of e. Here e ^k(∅) = 0.

Lemma 4. Given two networks N ₁ = (V ₁, E ₁) and N ₂ = (V ₂, E ₂). For u ₁, u ₂ ∈ V ₁ , v ₁, v ₂ ∈ V ₂ , and u ₁ ≡ ^k v ₁, u ₂ ≡ ^k v ₂ . Then, u ₁ ≡ ^k u ₂ if and only if v ₁ ≡ ^k v ₂.

Proof. Refer to the proof of the Theorem 15 in the paper²⁰.◽

A Metric

Definition 8. For two networks N ₁ = (V ₁, E ₁) and N ₂ = (V ₂, E ₂), the kth-distance d _k(N ₁, N ₂) equals

$$\frac{1}{k({n}_{1}+{n}_{2})}\{\sum _{i=1}^{k}[\sum _{v\in {L}^{i}({N}_{1})}max\{0,{e}^{i}(v)-{e}^{i}(v^{\prime} )\}+\sum _{u\in {L}^{i}({N}_{2})}max\{0,{e}^{i}(u)-{e}^{i}(u^{\prime} )\}]\}$$

(1)

where v′ (or u′) is a node in L ⁱ(N ₂) (or L ⁱ(N ₁)) that is ith-order equivalent to v (or u), and if no such node exists, then v′ = ∅ (or u′ = ∅). n ₁ and n ₂ are the number of nodes in N ₁ and N ₂ respectively.

For each i (1 ≤ i ≤ k), the maximal value of ${\sum }_{v\in {L}^{i}({N}_{1})}max\{\mathrm{0,}\,{e}^{i}(v)-{e}^{i}(v^{\prime} )\}+{\sum }_{u\in {L}^{i}({N}_{2})}max\{\mathrm{0,}\,{e}^{i}(u)-{e}^{i}(u^{\prime} )\}$ is n ₁ + n ₂, so the formulate 1 has maximal value 1 and minimal value 0. For a give i (1 ≤ i ≤ k), if the value of ${\sum }_{v\in {L}^{i}({N}_{1})}max\{\mathrm{0,}\,{e}^{i}(v)-{e}^{i}(v^{\prime} )\}+{\sum }_{u\in {L}^{i}({N}_{2})}max\{\mathrm{0,}\,{e}^{i}(u)-{e}^{i}(u^{\prime} )\}$ is d, then for any j (i + 1 ≤ j ≤ k), the value of ${\sum }_{v\in {L}^{j}({N}_{1})}max\{\mathrm{0,}\,{e}^{j}(v)-{e}^{j}(v^{\prime} )\}+{\sum }_{u\in {L}^{j}({N}_{2})}max\{\mathrm{0,}\,{e}^{j}(u)-{e}^{j}(u^{\prime} )\}$ is more than d.

From the definition 8, it follows that the 1st-distance is the m-distance defined in the space of reduced phylogenetic networks, and the 2nd-distance is the d _e-distance defined in the space of partly reduced phylogenetic networks.

Lemma 5. If d _k(N ₁, N ₂) = 0. Then |V ₁| = |V ₂|, and there exists a node v ₀ ∈ L ⁱ(V ₂) for each node v ∈ L ⁱ(V ₁), such that v ₀ ≡ ⁱ v and e ⁱ(v ₀) = e ⁱ(v) (1 ≤ i ≤ k).

Proof. From d _k(N ₁, N ₂) = 0, it follows that ${\sum }_{v\in {L}^{i}({N}_{1})}max\{0,{e}^{i}(v)-{e}^{i}(v^{\prime} )\}=0$ and ${\sum }_{u\in {L}^{i}({N}_{2})}max\{\mathrm{0,}{e}^{i}(u)-{e}^{i}(u^{\prime} )\}=0$ (1 ≤ i ≤ k). So max{0, e ⁱ(v) − e ⁱ(v′)} = 0 for each node v ∈ L ⁱ(N ₁). Suppose that there exists a node v ∈ L ⁱ(N ₁) such that e ⁱ(v) − e ⁱ(v′) < 0, then e ⁱ(v′) − e ⁱ(v) > 0. So ${\sum }_{u\in {L}^{i}({N}_{2})}max\{0,{e}^{i}(u)-{e}^{i}(u^{\prime} )\} > 0$. It contradict ${\sum }_{u\in {L}^{i}({N}_{2})}max\{0,{e}^{i}(u)-{e}^{i}(u^{\prime} )\}=0$. Therefore, for each node v ∈ L ⁱ(N ₁), we have e ⁱ(v) − e ⁱ(v′) = 0, i.e. e ⁱ(v) = e ⁱ(v′). Similarly, for each node u ∈ L ⁱ(N ₂), e ⁱ(u) = e ⁱ(u′). Accordingly, |V ₁| = |V ₂|.◽

Lemma 6. Given two kth-order reduced phylogenetic networks N ₁ = (V ₁, E ₁) and N ₂ = (V ₂, E ₂). Then d _k(N ₁, N ₂) = 0 if and only if N ₁ and N ₂ are isomorphic.

Proof. If N ₁ and N ₂ are isomorphic, obviously d _k(N ₁, N ₂) = 0. The converse conclusion will be proven as follows.

Lemma 5 tells us that |V ₁| = |V ₂|. From the property of the kth-order reduced phylogenetic networks, it follows that each node u in V ₁ is just kth-order equivalent with itself and u ∈ L ^k(V ₁). Similarly, each node v in V ₂ is just kth-order equivalent with itself and v ∈ L ^k(V ₂). Moreover, for each node u ∈ V ₁, there exists the only one node v ∈ V ₂ such that u ≡ ^k v. So we define a mapping H from V ₁ to V ₂, for each node u ∈ V ₁, H(u) = u′, where u′ ∈ V ₂ and u′ ≡ ^k u.

First we prove that the mapping H is a bijection. For any two different nodes u ₁, u ₂ ∈ V ₁, there exist two nodes ${u}_{1}^{^{\prime} },{u}_{2}^{^{\prime} }\in {V}_{2}$, such that $H({u}_{1})={u}_{1}^{^{\prime} }$ and $H({u}_{2})={u}_{2}^{^{\prime} }$. Here ${u}_{1}^{^{\prime} }$ and ${u}_{2}^{^{\prime} }$ are not the same nodes. If not, then u ₁ ≡ ^k u ₂. It contradict that each node u ∈ V ₁ is just kth-order equivalent with itself. So H is injective. Due to |V ₁| = |V ₂|, we have that H is a surjection.

Then we prove that if (u, v) ∈ E ₁, then (H(u), H(v)) ∈ E ₂. Let u ₀ = H(u) and v ₀ = H(v), i.e. u ₀ ≡ ^k u and v ₀ ≡ ^k v. If k is an odd number, then the children of u are kth-order equivalent with the children of u ₀ respectively. Thus, v is kth-order equivalent with a child v′ of u ₀, i.e. v′ ≡ ^k v ≡ ^k v ₀. Since every node is only kth-order equivalent with itself, v′ and v ₀ are the same nodes, i.e. v ₀ is a child of u ₀. Therefore, (u ₀, v ₀) ∈ E ₂. Similarly, we can come to the conclusion when k is an even number.

The mapping H also preserves the labels of the leaves from the definition of kth-order equivalence. In conclusion, N ₁ and N ₂ are isomorphic.

Lemma 7. For any one pair of networks N ₁ and N ₂, d _k(N ₁, N ₂) = d _k(N ₂, N ₁).

The distance d _k(N ₁, N ₂) can be viewed as the symmetric difference of the same set of elements ${\cup }_{i=1}^{k}\{{L}^{i}({N}_{1})\cup {L}^{i}({N}_{2})\}$. From the property of the symmetric difference²¹, it follows that the following triangle inequality holds:

Lemma 8. For any three networks N ₁ , N ₂ and N ₃ , d _k(N ₁, N ₂) + d _k(N ₂, N ₃) ≥ d _k(N ₁, N ₃).

From Lemmas 6, 7 and 8, we have the following result:

Theorem 9 The kth-distance defined by the formula 1 is a metric on the space of kth-order reduced phylogenetic networks.

Let k = 3 and n _j the number of nodes of network N _j (j = 1, 2). Consider the two networks in Fig. 1. For i = 1 and 2, ${\sum }_{v\in {L}^{i}({N}_{1})}max\{0,{e}^{i}(v)-{e}^{i}(v^{\prime} )\}+{\sum }_{u\in {L}^{i}({N}_{2})}max\{0,{e}^{i}(u)-{e}^{i}(u^{\prime} )\}=0$. For i = 3, ${\sum }_{v\in {L}^{i}({N}_{1})}max\{0,{e}^{i}(v)-{e}^{i}(v^{\prime} )\}+{\sum }_{u\in {L}^{i}({N}_{2})}max\{0,{e}^{i}(u)-{e}^{i}(u^{\prime} )\}={n}_{1}+{n}_{2}$. So the d(N ₁, N ₂) = 1/3.

Consider two networks in Fig. 2. The nodes R, B, E, F, K in V ₁ don’t exist first-order equivalent nodes in V ₂, while the nodes R, B, F in V ₂ don’t exist first-order equivalent nodes in V ₁. Everyone else has only one first-order equivalent node. So ${\sum }_{v\in {L}^{1}({N}_{1})}max\{0,{e}^{1}(v)-{e}^{1}(v^{\prime} )\}+{\sum }_{u\in {L}^{1}({N}_{2})}max\{0,{e}^{1}(u)-{e}^{1}(u^{\prime} )\}=8$. For i = 2 and 3, every node in V ₁ doesn’t exist ith-order equivalent nodes in V ₂. So ${\sum }_{v\in {L}^{i}({N}_{1})}max\{0,{e}^{i}(v)-{e}^{i}(v^{\prime} )\}+{\sum }_{u\in {L}^{i}({N}_{2})}max\{0,{e}^{i}(u)-{e}^{i}(u^{\prime} )\}$ $={n}_{1}+{n}_{2}=13+15=28$. Accordingly d(N ₁, N ₂) = (8 + 28 + 28)/(3 × 28) = 16/21.

Consider two networks in Fig. 3. The nodes R, B, F in V ₁ don’t exist first-order equivalent nodes in V ₂, and the nodes R, B, F, H, 6 in V ₂ don’t exist first-order equivalent nodes in V ₁. Everyone else has only one first-order equivalent with node. So ${\sum }_{v\in {L}^{1}({N}_{1})}max\{0,{e}^{1}(v)-{e}^{1}(v^{\prime} )\}+{\sum }_{u\in {L}^{1}({N}_{2})}max\{0,{e}^{1}(u)-{e}^{1}(u^{\prime} )\}=8$. For i = 2 and 3, every node in V ₁ doesn’t exist ith-order equivalent nodes in V ₂. So ${\sum }_{v\in {L}^{i}({N}_{1})}max\{0,{e}^{i}(v)-{e}^{i}(v^{\prime} )\}+{\sum }_{u\in {L}^{i}({N}_{2})}max\{0,{e}^{i}(u)-{e}^{i}(u^{\prime} )\}$ $={n}_{1}+{n}_{2}=13+15=28$. Accordingly d(N ₁, N ₂) = (8 + 28 + 28)/(3 × 28) = 16/21.

Lemma 10. If there is d _k(N ₁, N ₂) = 0 for all k. Then there exists a positive integer m, such that for any m ₀ ≥ m, we have that each node u in V ₁ has a m ₀ th-order equivalent node u′ in V ₂.

Proof. Assume that the above conclusion does not hold, i.e. for any positive integer m, there exist k ₀ ≥ m and a node u ∈ V ₁, such that u′ ≢ ${}^{{k}_{0}}u$ for any node u′ ∈ V ₂. So when m = 1, there exist k ₁ and u ₁ ∈ V ₁, such that u ₁ ≢ ${}^{{k}_{1}}u^{\prime} $ for any node u′ ∈ V ₂. So ${d}_{{k}_{1}}({N}_{1},{N}_{2})\ne 0$. This conclusion is in contradiction with d _k(N ₁, N ₂) = 0 for all k. ◽

Computational Aspects

For odd number k (or even number k), the kth-order equivalent nodes can be computed by a bottom-up (or top-down) approach, no matter whether the nodes are in the same network or two different networks. Given two networks N ₁ = ((V ₁, E ₁), f ₁) and N ₂ = ((V ₂, E ₂), f ₂). Algorithm 8 shows the pseudocode that decides whether two nodes are kth-order equivalent or not, where E(k) is the abbreviation for the set of kth-order equivalent nodes. This process will cost at most O(n ³) time, where n = max(|V ₁|, |V ₂|). Therefore, it takes totally at most O(n ⁵) time to find out all ith-order (where 1 ≤ i ≤ k) equivalent nodes for each node of the two networks. Computing the formula 1 will costs O(n) time. In conclusion, we will spend O(n ⁵) time in computing the kth-distance between two networks, where n is the maximum of |V ₁| and |V ₂|.

Results

We compared the kth-distance with m-distance on the space of reduced phylogenetic networks¹⁹ and the d _e-distance on the space of partly reduced phylogenetic networks²⁰, by means of 100 networks constructed by the Lnetwork method³

Algorithm 1: Deciding whether u and v are kth-order equivalent or not for an odd number k (or an even number k).
1: input: nodes u and v
2: if outdegree of u is not equals to that of v (or indegree of u is not equals to that of v) then
3: return
4: end if
5: if u and v are leaves and they have the same labels (or u and v are the root) then
6: add v to E(k) of u
7: add u to E(k) of v
8: else
9: flag := false
10: if E(k − 1) of u does’t contain v then
11: return
12: end if
13: for each child a of u (or each parent a of u) do
14: for each child b of v (or each parent b of v) do
15: if b.label = true then
16: continue
17: end if
18: if the E(k) of a has b then
19: flag = true
20: b.label = true
21: end if
22: end for
23: if flag = false then
24: return
25: else
26: flag = false
27: end if
28: end for
29: add v to E(k) of u
30: add u to E(k) of v
31: end if

. Thus, each distance method can obtain a distance matrix with approximately 5000 values. Figure 8 shows the distribution of the distance values, where the horizontal axis is the distance value and the vertical axis is the percent of the distance value in all values. Here the results of d _e-distance didn’t show in Fig. 8, because it just has two distance values 1 and 0, and 99.38 percent and 0.62 percent respectively. The minima of m-distance and the d _e-distance are 0, while the minimum of kth-distance is 0.32.

From the results, we reached the following conclusions. First, almost all d _e-distance values are maximum values 1. Second, the kth-distance values are not 0 between the networks whose d _e-distance and the m-distance values are 0. Third, the kth-distance values are larger than the m-distance values for the same networks.

Discussion

In order to compare dissimilarity for more phylogenetic networks, we define a polynomial-time computable metric on the space of kth-order reduced phylogenetic networks. Here the larger k is, the larger the space of kth-order reduced phylogenetic networks is. Moreover, the larger k is, the more precise the distance between two phylogenetic networks is. Take the non-isomorphism networks in Fig. 1 for example. When k = 1 or 2, the value computed by the formula 1 is 0, i.e. their m-distance and d _e-distance are 0. However, when k = 3, the value computed by the formula 1 is 1/3. So when k = 1 or 2, the value computed by the formula 1 doesn’t indicate the real dissimilarity between the two networks. The choose of k in general is based on the desired precision of distance. Whatever k is, the kth-distance is not a metric on the space of all rooted phylogenetic networks. For example, the two phylogenetic networks in Fig. 9, their kth-distance is 0, but they are not isomorphic.

References

Pagel, M. Inferring the Historical Patterns of Biological Evolution. Nature 401, 877–884 (1999).
Article ADS CAS PubMed Google Scholar
Wang, J. A new algorithm to construct phylogenetic networks from trees. Genetics and Molecular Research 13, 1456–1464 (2014).
Article CAS PubMed Google Scholar
Wang, J. et al. LNETWORK: an efficient and effective method for constructing phylogenetic networks. Bioinformatics 29, 2269–2276 (2013).
Article CAS PubMed Google Scholar
Wang, J. et al. BIMLR: A Method for Constructing Rooted Phylogenetic Networks from Rooted Phylogenetic Trees. Gene 527, 344–351 (2013).
Article CAS PubMed Google Scholar
Zou, Q. et al. Survey of MapReduce frame operation in bioinformatics. Briefings in Bioinformatics 15, 637–647 (2013).
Article PubMed Google Scholar
Zou, Q. et al. HAlign: Fast Multiple Similar DNA/RNA Sequence Alignment Based on the Centre Star Strategy. Bioinformatics 31, 2475–2481 (2015).
Article CAS PubMed Google Scholar
Zou, Q. et al. Similarity computation strategies in the microRNA-disease network: A Survey. Briefings in Functional Genomics 15 (2015).
Wang, J. et al. FastJoin, an improved neighbor-joining algorithm. Genetics and Molecular Research 11, 1909–1922 (2012).
Article CAS PubMed Google Scholar
Robinson, D. F. & Foulds, L. R. Comparison of phylogenetic trees. Math. Biosci. 53, 131–147 (1981).
Article MathSciNet MATH Google Scholar
Critchlow, D. E. et al. The triples distance for rooted bifurcating phylogenetic trees. Systematic Biology 45, 323–334 (1996).
Article Google Scholar
Waterman, M. S. & Smith, T. F. On the similarity of dendograms. J. Theor. Biol. 73, 789–800 (1978).
Article CAS PubMed Google Scholar
Bluis, J. & Shin, D. G. Nodal distance algorithm: Calculating a phylogenetic tree comparison metric, in Proc. 3rd IEEE Symp. BioInformatics and BioEngineering, pp. 87–94 (2003).
Huber, K. et al. Metrics on Multilabeled Trees: Interrelationships and Diameter Bounds. IEEE/ACM Transactions on Computational Biology and Bioinformatics 8, 1029–1040 (2011).
Article PubMed Google Scholar
Cardona, G. et al. A Distance Metric for a Class of Tree-Sibling Phylogenetic Networks. Bioinformatics 24, 1481–1488 (2008).
Article CAS PubMed PubMed Central Google Scholar
Nakhleh, L. et al. Towards the Development of Computational Tools for Evaluating Phylogenetic Network Reconstruction Methods, Proc. Eighth Pacific Symp. Biocomputing, pp. 315–326 (2003).
Moret, B. et al. Phylogenetic networks: modeling, reconstructibility and accuracy, IEEE/ACM Trans. Computational Biology and Bioinformatics 1, 13–23 (2004).
CAS PubMed Google Scholar
Baroni, M. et al. A Frame work for Representing Reticulate Evolution. Annals of Combinatorics 8, 391–408 (2004).
Article MathSciNet MATH Google Scholar
Cardona, G. et al. Tripartitions Do Not Always Discriminate Phylogenetic Networks. Math. Biosciences 211, 356–370 (2008).
Article MathSciNet CAS MATH Google Scholar
Nakhleh, L. A metric on the space of reduced phylogenetic networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics 7, 218–222 (2010).
Article PubMed Google Scholar
Wang, J. A Metric on the Space of Partly Reduced Phylogenetic Networks. BioMed Research International 1–10 (2016).
Cardona, G. et al. On Nakhleh’s Metric for Reduced Phylogenetic Networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics 6, 629–638 (2009).
Article PubMed Google Scholar

Download references

Acknowledgements

The work was supported by the Natural Science Foundation of Inner Mongolia province of China (2015BS0601); the National Natural Science Foundation of China (61661040, 61571163, 61532014, 61671189); the National Key Research and Development Plan Task of China (Grant No. 2016YFC0901902).

Author information

Authors and Affiliations

School of Computer Science, Inner Mongolia University, Hohhot, 010021, P.R. China
Juan Wang
School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, 100044, P.R. China
Maozu Guo

Authors

Juan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Maozu Guo
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.W. devised the metric, proved it and wrote the paper. M.G. designed the experiments and revised the paper.

Corresponding author

Correspondence to Maozu Guo.

Ethics declarations

Competing Interests

The authors declare that they have no competing interests.

Additional information

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Wang, J., Guo, M. A Metric on the Space of kth-order reduced Phylogenetic Networks. Sci Rep 7, 3189 (2017). https://doi.org/10.1038/s41598-017-03363-y

Download citation

Received: 05 August 2016
Accepted: 27 April 2017
Published: 09 June 2017
DOI: https://doi.org/10.1038/s41598-017-03363-y

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.