L-HetNetAligner: A novel algorithm for Local Alignment of Heterogeneous Biological Networks

Networks are largely used for modelling and analysing a wide range of biological data. As a consequence, many different research efforts have resulted in the introduction of a large number of algorithms for analysis and comparison of networks. Many of these algorithms can deal with networks with a single class of nodes and edges, also referred to as homogeneous networks. Recently, many different approaches tried to integrate into a single model the interplay of different molecules. A possible formalism to model such a scenario comes from node/edge coloured networks (also known as heterogeneous networks) implemented as node/ edge-coloured graphs. Therefore, the need for the introduction of algorithms able to compare heterogeneous networks arises. We here focus on the local comparison of heterogeneous networks, and we formulate it as a network alignment problem. To the best of our knowledge, the local alignment of heterogeneous networks has not been explored in the past. We here propose L-HetNetAligner a novel algorithm that receives as input two heterogeneous networks (node-coloured graphs) and builds a local alignment of them. We also implemented and tested our algorithm. Our results confirm that our method builds high-quality alignments. The following website *contains Supplementary File 1 material and the code.


1
Table shows the NCV-GS 3 scores obtained by aligning the first synthetic network with its noisy versions (Altered Networks). All synthetic networks are considered. The table shows significant improvements in the values of NCV-GS 3 and hence in the quality of the alignments when considering networks with many colours with respect to the network with a single colour. The improvement is also stable for all the networks. . . . . . . . 6 2 Table shows the GS 3 scores obtained by aligning the first synthetic network with its noisy versions (Altered Networks). All the results are equal to 1, therefore the Table shows the NCV scores obtained by aligning the first synthetic network with its noisy versions (Altered Networks). All synthetic networks are considered. The table shows significant improvements in the values of NCV and hence in the quality of the alignments when considering networks with many colours with respect to the network with a single colour. The improvement is also stable for all the networks. . . . . . . . . . . . . . . . 8 4 Table shows the P-NC scores obtained by aligning the first synthetic network with its noisy versions (Altered Networks). Table shows that the alignment quality is constant when adding colors to the networks. . . . . . . . . 9 5 Table shows the R-NC scores obtained by aligning the first synthetic network with its noisy versions (Altered Networks). All synthetic networks are considered. The table shows significant improvements in the values of R-NC and hence in the quality of the alignments when considering networks with many colours with respect to the network with a single colour. The improvement is also stable for all the networks. . . . . . . . . . . . . . . . 10 6 Table shows the F-NC scores obtained by aligning the first synthetic network with its noisy versions (Altered Networks). All synthetic networks are considered. The table shows significant improvements in the values of F-NC and hence in the quality of the alignments when considering networks with many colours with respect to the network with a single colour. The improvement is also stable for all the networks. . . . . . . . . . . . . . . . 11 7 NCV-GS 3 scores obtained by aligning the original Hetionet network with its noisy versions.  Quality of the alignments for synthetic networks  Tables 1, 2, 3, 4, 5, and 6 report the NCV-GS 3 , GS 3 , NCV, P-NC, R-NC, F-NC measures computed on each module for the four  versions of synthetic heterogeneous networks. Quality of the alignments for the Hetionet Network Table 7, Table 8, Table 9, Table 10, Table 11, Table 12 report the NCV-GS 3 , GS 3 , NCV, P-NC, R-NC, F-NC measures computed on each module for the four versions of the Hetionet network.
Quality of the Alignments for synthetic heterogeneous networks with noise.  Table 24 report the NCV-GS 3 , GS 3 , NCV, P-NC, R-NC, F-NC measures computed on each module for the four versions of the Hetionet network obtained by aligning each network with its noisy version built by adding a percentage of edges. Table 1. Table shows the NCV-GS 3 scores obtained by aligning the first synthetic network with its noisy versions (Altered Networks). All synthetic networks are considered. The table shows significant improvements in the values of NCV-GS 3 and hence in the quality of the alignments when considering networks with many colours with respect to the network with a single colour. The improvement is also stable for all the networks.        Table 8. Table shows the GS 3 scores obtained by aligning the original Hetionet network with its noisy versions. Table shows that the alignment quality is constant when considering colours.  Table 10. Table shows the P-NC scores obtained by aligning the original Hetionet network with its noisy versions. Table  shows that the alignment quality is constant when considering colours.    Table 25, Table 26, Table 27, Table 28, Table 29, Table 30 report the NCV-GS 3 , GS 3 , NCV, P-NC, R-NC, F-NC measures computed on each module for the four versions of synthetic heterogeneous networks obtained by aligning each network with its noisy versions built by adding a percentage of nodes. Table 31, Table 32, Table 33, Table 34, Table 35, Table 36 report the NCV-GS 3 , GS 3 , NCV, P-NC, R-NC, F-NC measures computed on each module for the four versions of the Hetionet network obtained by aligning each network with its noisy counterpart built by adding a percentage of nodes.

Quality of the Alignments for Hetionet network with noise. Quality of the alignments for synthetic heterogeneous networks obtained by aligning each network with its noisy versions built by adding a percentage of nodes
Quality of the alignments of Hetionet network obtained by aligning each network with its noisy versions built by adding a percentage of nodes                    Quality of the alignments synthetic heterogeneous networks obtained by aligning each network with its noisy versions built by removing a percentage of nodes randomly. Table 37, Table 38, Table 39, Table 40, Table 41, Table 42 report the NCV-GS 3 , GS 3 , NCV, P-NC, R-NC, F-NC measures computed on each module for the four versions of synthetic heterogeneous networks obtained by aligning each network with its noisy versions built by removing a percentage of nodes.   Table 43, Table 44, Table 45, Table 46, Table 47, Table 48 report the NCV-GS 3 , GS 3 , NCV, P-NC, R-NC, F-NC measures computed on each module for the four versions of the Hetionet network obtained by aligning each network with its noisy counterpart built by removing a percentage of nodes.

Experiments on different Network Models.
We generated the synthetic networks with different models to test the performances of our algorithm on different network structure. The aim of this experiment is to demonstrate the robustness of our approach on the change of network structure. Following results highlight that the algorithm has good performances in almost all the network models. We built 5 synthetic networks having respectively 5000, 25000, 50000, 75000, 95000 nodes and a scale-free model. We built 5 synthetic networks having respectively 5000, 25000, 50000, 75000, 95000 nodes, and a geometric network model. We built 5 synthetic networks having respectively 5000, 25000, 50000, 75000, 95000 nodes using and a Erdos-Renyi network model. We built 5 synthetic networks having respectively 5000, 25000, 50000, 75000, 95000 nodes, and a small-world network model. The, we randomly assign each node a colour out of n possible colours. We vary n from one to four. That is, for each synthetic network, we built heterogeneous versions with one, two, three, and four colours. Then, we built the synthetic versions for all network models by random removing 5%, 10%, 15%, 20% and 25% of edges from the original network.
Then, we applied L-HetNetAligner to align the synthetic networks with its noisy versions. Finally, we compute NCV-GS 3 and F-NC measures for each synthetic network model. Table 49 and Table 53 report the NCV-GS 3 and F-NC measures for all the network models in one colour version. Table 50  and Table 54 report the NCV-GS 3 and F-NC measures for all the network models in two colour versions. Table 51 and Table 55 report the NCV-GS 3 and F-NC measures for all the network models in three colour versions. Table 52 and Table 56 report the NCV-GS 3 and F-NC measures for all the network models in four colour versions. In terms of quality, we expect that for a given noise level, the more colours are used, the better the alignment quality should be. Moreover, the use of colours should also improve the robustness to noise compared to the use of fewer colours. The analysis of results shows that for a given level of noise the use of colours improves the quality of the alignment. Besides, the robustness to the impact of noise is better.

Synthetic Networks with eight colors.
The input dataset consists of a synthetic networks built using scale-free networks (SF) graph generator. The network has 950 nodes and 4124. Then, we assign each node a colour out of n possible colours. We vary n from 1 to 8 in order to build four heterogeneous versions for each synthetic network as follows: • 1 coloured version; • 2 coloured version ( in which 580 nodes present one colour and 382 nodes have another colour); • 3 coloured version where we randomly assign one colour to 358 nodes, a second colour to 256 nodes and a third colour to 336 nodes; • 4 coloured version where we randomly assign one colour to 170 nodes, a second colour to 288 nodes, a third colour to 192 nodes and a fourth to 300 nodes.
• 5 coloured version where we randomly assign one colour to 110 nodes, a second colour to 210 nodes, a third colour to 157 nodes, a fourth colour to 314 nodes and a fifth to 159 node; • 6 coloured version where we randomly assign one colour to 98 nodes, a second colour to 124 nodes, a third colour to 242 nodes, a fourth colour to 103 nodes, a fifth colour to 211 nodes and sixth to 172 nodes; • 7 coloured version where we randomly assign one colour to 110 nodes, a second colour to 124 nodes, a third colour to 170 nodes, a fourth colour to 223 nodes, a fifth colour to 94 nodes, a sixth to 115 nodes and seventh to 114 nodes; • 8 coloured version where we randomly assign one colour to 110 nodes, a second colour to 94 nodes, a third colour to 121 nodes, a fourth to 97 nodes, a fifth colour to 128 nodes, a sixth to 100 nodes, seventh colour to 130 nodes and eighth to 170 nodes.
We built the synthetic versions by random removing 5%, 10%, 15%, 20% and 25% of edges from the original network. Then, we applied L-HetNetAligner to align the synthetic network with its noisy versions. Finally, we compute NCV-GS 3 and F-NC measures for each synthetic network. Table 57 and Table 58 report the NCV-GS 3 and F-NC the measures related to the alignment of the original synthetic network with its versions at 0%, 5%, 10%, 15%, 20% and 25% of added noise for all synthetic networks.
In terms of quality, we expect that for a given noise level, the more colours are used, the better the alignment quality should be. Moreover, the use of colours should also improve the robustness to noise compared to the use of fewer colours. The analysis of results shows that for a given level of noise the use of colours improves the quality of the alignment. Besides, the robustness to the impact of noise is better.

Comparison with respect to Single Color Alignments.
To demonstrate the effectiveness of L-HetNaligner we want to test if our algorithm is able to obtain better results when we apply it on the original network than on different subnetworks obtained from this one. We considered two cases 1) synthetic networks, and 2) Hetionet network. We consider the same synthetic network with 950 nodes and 3410 edges and four node colors. Then we split this network into four network considering nodes of the same color.
We obtain the subnetwork 1 has 170 nodes and 474 edges, the subnetwork 2 has 250 nodes and 422 edges, the subnetwork 3 has 330 nodes and 528 edges, and the subnetwork 4 has 220 nodes and 404 edges. Please note that the sum of the edges of the four subnetworks is lower than the number of the initial network since all the cross-edges, i.e. edges among node of different colors, have been removed.
We select the Hetionet network (with 37142 nodes and 6014211 edges) in four coloured version and from it we created four subnetwork according to four node types (i.e. colours.) According to this, the subnetwork 1 has 2095 nodes and 26567 edges, subnetwork 2 has 136 nodes and 543 edges, subnetwork 3 has 405 nodes and 742 edges, subnetwork 15056 has 78234 nodes and 404 edges.
Then, we built the synthetic versions for the synthetic network and Hetionet network and their subnetworks by random removing 5%, 10%, 15%, 20% and 25% of edges from the original network.
Then, we applied L-HetNetAligner to align the synthetic networks and Hetionet network with their sub networks with their noisy versions. Then, we compute NCV-GS 3 and F-NC measures for the synthetic network and Hetionet network and for the sub networks. Finally, we tested the ability of our algorithm to infer missing links from input networks (link prediction) by count how many homogeneous and heterogeneous gaps are found in the alignment graph of the synthetic network and Hetionet network and for their sub networks. Table 59 and Table 60 report NCV-GS 3 and F-NC scores the synthetic network and its four subnetwork and Hetionet network and its four subnetwork. As evident the NCV-GS 3 and F-NC values in original network outperform NCV-GS 3 and F-NC values in subnetworks for both synthetic network and Hetionet network.   Table 61 reports the number of correctly predicted links obtained by aligning original synthetic network and it subnetworks with its noisy versions obtained by random removal of pair-matched nodes for all the networks. Table 62 reports the number of correctly predicted links obtained by aligning Hetionet network and it subnetworks with its noisy versions obtained by random removal of pair-matched nodes for all the networks. We should note L-HetNetAligner can predict a high number of link for the synthetic network respect to its sub networks and for Hetionet respect to its sub networks.

Predicted Links Missed by Single Color Alignments
Finally, we reported an examples of predicted link in Hetionet networks: • Metaedge Anatomy::UBERON:0000955-Gene::3892 is missing in one coloured version and it is predicted in two, three and four colour versions;   Table 65. F-NC scores obtained by aligning the original synthetic network with its noisy versions for all the networks. In the alignment graph building all pair of nodes were selected.