GRAPE for fast and scalable graph processing and random-walk-based embedding

Graph representation learning methods opened new avenues for addressing complex, real-world problems represented by graphs. However, many graphs used in these applications comprise millions of nodes and billions of edges and are beyond the capabilities of current methods and software implementations. We present GRAPE (Graph Representation Learning, Prediction and Evaluation), a software resource for graph processing and embedding that is able to scale with big graphs by using specialized and smart data structures, algorithms, and a fast parallel implementation of random-walk-based methods. Compared with state-of-the-art software resources, GRAPE shows an improvement of orders of magnitude in empirical space and time complexity, as well as competitive edge- and node-label prediction performance. GRAPE comprises approximately 1.7 million well-documented lines of Python and Rust code and provides 69 node-embedding methods, 25 inference models, a collection of efficient graph-processing utilities, and over 80,000 graphs from the literature and other sources. Standardized interfaces allow a seamless integration of third-party libraries, while ready-to-use and modular pipelines permit an easy-to-use evaluation of graph-representation-learning methods, therefore also positioning GRAPE as a software resource that performs a fair comparison between methods and libraries for graph processing and embedding.


Introduction
In various fields such as biology, medicine, data and network science, graphs can naturally model available knowledge as interrelated concepts, represented by a network of nodes connected by edges.The wide range of graph applications has motivated the development of a rich literature on Graph Representation Learning (GRL) and inference models [1].GRL models compute embeddings, i.e. vector representations of the graph and its constituent elements, capturing their topological, structural, and semantic relationships.Graph inference models can use such embeddings and available additional features for several tasks, e.g., visualization, clustering, node-label, edge-label, and edge prediction problems [1].
State-of-the-art GRL algorithms, including, among others, methods based on matrix factorization, random walks (RW), graph kernels [2], triple sampling, and (deep) Graph Neural Networks (GNN) [1,3], have shown their effectiveness in analyzing networks from sociology, biology, medicine, and many other disciplines.
In this context, for scalability issues, RW-based GRL models are often preferred.However, their performance is often affected by the high computational costs required by the RW generators.Indeed current state-of-the-art RW-based graph embedding libraries display a limited ability to efficiently generate enough RW data samples to accurately represent the topology of the underlying graph.This limits the performance of node and edge label prediction methods, which strongly depends on the informativeness of the underlying embedded graph representation.The efficient generation of billions of sampled RWs could lead to more accurate embedded representations of graphs and could boost the performance of machine learning methods that learn from the embedded vector representation of nodes and edges.
The Findable, Accessible, Interoperable and Reusable (FAIR) comparison of different graph-based methods under different experimental set-ups is a relevant open issue, only very recently considered in literature in the context of the Open Graph Benchmark Large-Scale Challenge (OGB-LSC).This initiative enables a FAIR comparative evaluation of different models on three specific large-scale graphs [16].However, further efforts are required to provide standard interfaces to easily integrate methods from different libraries and public experimental pipelines, and to allow a FAIR comparison of different methods and libraries for the analysis of any graph-based data.GRAPE (Graph Representation leArning, Prediction and Evaluation) provides a modular and flexible solution to the above problems by offering: (a) a scalable and fast software library that efficiently implements RW-based embedding methods, graph processing algorithms and inference models that can run both on general-purpose desktop and laptop computers, as well as on high-performance computing clusters; (b) an extensive set of efficient and effective built-in GRL algorithms that any user can continuously update by implementing easy-to-use standardized interfaces; (c) ready-to-use evaluation pipelines to guarantee a fair and reproducible evaluation of any GRL algorithm (implemented or integrated into GRAPE ) using the ∼ 80, 000 graphs retrievable through the library or also other graphs provided by the user.Therefore, GRAPE can also be viewed as an efficient collector of GRL methods to perform a FAIR comparison on a large set of available graphs.

Results
2.1 Overview of the GRAPE resource: Embiggen and Ensmallen GRAPE consists of about about 1.5 million lines of Python code and about 200,000 lines of Rust code (results computed with the Tokei toolhttps://docs.rs/tokei/latest/tokei),implementing efficient data structures and parallel computing techniques to enable scalable graph processing and embedding.
The library's high-level structure, overall functionalities, and its two core modules, Ensmallen (ENabler of SMAll computational resources for LargE Networks) and Embiggen (EMBeddInG GENerator), are depicted in Fig. 1a.
Ensmallen efficiently loads big graphs and executes graph processing operations, owing to its Rust [18] implementation, and to the usage of map-reduce thread-based parallelism and branch-less Single Instruction Multiple Data (SIMD) parallelism.It also provides Python bindings for ease of use.
Designed to leverage succinct data structures [46], GRAPE requires only a fraction of the memory required by other libraries and guarantees average constant-time rank and select operations [19].This makes it possible to execute many graph processing tasks, e.g.accessing node neighbours and running first-and second-order RWs, with memory usage close to the theoretical minimum.
However, the performance of RW-based embedding methods is often affected by the high computational costs required by the random-walk generators that often rely on a limited number of random-walk samples that cannot accurately represent the topology of the underlying graph.This leads to uninformative graph embeddings that affect the performance of the subsequent graph-prediction models.To overcome these limitations GRAPE focuses on smart and efficient implementations of random-walk-based embedding methods since its main objective is to scale with large graphs (see Methods for details), while other effective but more complex models based, e.g. on GNN [3] available from other libraries [20] are not yet implemented in the library, due to their well-known scaling limitations [3,21].
Among the many high-performance algorithms implemented in GRAPE , we propose an algorithm, i.e.Sorted Unique Sub-Sampling (SUSS) that allows approximated RWs to be computed to enable processing graphs that contain very high-degree nodes (degree > 10 6 ), unmanageable for the corresponding exact analogous algorithms.Approximated RW can achieve edge-prediction performance comparable to those obtained by the corresponding exact algorithm with a speed-up from two to three orders of magnitude (section 4.3.3 and 2.3).
Ensmallen also provides many other methods and utilities, such as refined multiple holdout techniques to avoid biased performance evaluations, Bader and Kruskal algorithms for computing random and .minimum spanning arborescence and connected components, stress and betweenness centrality [22], node and edge filtering methods, and algebraic set operations on graphs.Ensmallen allows graphs to be loaded from a wide variety of node and edge list formats (section 2.2).In addition, users can automatically load data from an ever-increasing list of over 80, 000 graphs from the literature and elsewhere (Fig. 1b, detailed in section 4.7.1).
GRAPE provides three modular pipelines to compare and evaluate node-label, edge-label and edge prediction performance under different experimental settings (section 4.7.2, fig.1b), as well as utilities for graph visualization (fig.1c).These pipelines allow non-expert users to tailor their desired experimental setup and quickly obtain actionable and reproducible results (Fig. 1b).Furthermore, GRAPE provides interfaces to integrate third-party models and libraries (e.g., KarateClub [33] and PyKeen [10] libraries).This way, the evaluation pipelines can compare models implemented or integrated into GRAPE (section 4.7).
The possibility to integrate external models and the availability of graphs for testing them on the same datasets allows for answering a still open and crucial issue in literature, which regards the FAIR, objective, reproducible, and efficient comparison of graph-based methods and software implementations (Section 4.7.1).
We used the evaluation pipelines to compare the edge and node-label prediction performance of 16 embedding models.Moreover, we compared GRAPE with state-of-the-art graph-processing libraries across several types of graphs having different sizes and characteristics, including big real-world graphs such as Wikipedia, the CTD, Comparative Toxicogenomic Database [34] and biomedical Knowledge Graphs generated through PheKnowLator [35], showing that GRAPE achieves state-of-the-art results in processing big real-world graphs both in terms of empirical time and space complexity and prediction performance.

Fast error-resilient graph loading
GRAPE has been carefully designed to efficiently perform in space and time.In this section, we carried out a comparative study of performance with state-of-the-art graph processing libraries (including NetworkX [36], iGraph [4], CSRGraph, PecanPy [9]) in terms of empirical space and time used for loading 44 different real-world graphs (Fig. 2 a and b).Results show that GRAPE is faster and requires less memory than state-of-the-art libraries.For instance, GRAPE loads the ClueWeb09 graph (1.7 billion of nodes and 8 billion of undirected edges) in less than 10 minutes and requires about 60GB of memory, whereas the other libraries were not able to load this graph.In addition, GRAPE can process many graph formats and check for common format errors simultaneously.All graphs and libraries used in these experiments are directly available from GRAPE.Detailed results are available in the Supplementary Information Sections 1 and 2).

GRAPE outperforms state-of-the-art libraries on RW generation
Through extensive use of thread and SIMD parallelism and specialized quasi-succinct data structures, GRAPE outperforms state-of-the-art libraries by one to four orders of magnitude in the computation of RWs, both in terms of empirical computational time and space requirements (Figure 2-c, d, e, f and Section 2.3.1),where the method used to measure execution time and peak memory usage properly is presented in Supplementary Information Section 6.3.
Further speed-up of second-order RW computation is obtained by dispatching one of the 8 optimized implementations of Node2Vec sampling [29].The dispatching is based on the values of the return and in-out parameters and the type of the graph (weighted or unweighted).GRAPE automatically provides the version best suited to the requested task, with minimal code redundancy (Section 4.3.1).The time performance difference between the least and the most computationally expensive implementations is around two orders of magnitude (Supplementary Information Section 7.2 and Supplementary Tables 50 and 51).

Experimental comparison of graph processing libraries.
We compared GRAPE with a set of state-of-the-art libraries, including GraphEmbedding, Node2Vec, CSRGraph and PecanPy [9], on a large set of first and second-order RW tasks.The RW procedures in the GraphEmbedding and Node2Vec libraries use the alias method (Supplementary Information Section 7.2.3).The PecanPy library also employs the alias method for small graph use-cases (less than 10, 000 nodes).CSRGraph, on the other hand, computes the RWs lazily using Numba [37].Similarly, PecanPy leverages Numba lazy generation for graphs having more than 10, 000 nodes.All libraries are further detailed in Supplementary Information Section 1.
Figure 2 shows the experimental results of a complete iteration of one-hundred step RWs on all the nodes across 44 graphs having a number of edges ranging from some thousands to several billion (Section 2.2).GRAPE greatly outperforms all the compared graph libraries on both first and second-order RWs in terms of space and time complexity.Note that GRAPE scales well with the biggest graphs considered in the experiments, while the other libraries either crash when exceeding 200GB of memory or take more than 4 hours to execute the task (Figure 2 c, d, e, f).

Approximated RWs to process graphs with high-degree nodes
RWs on graphs containing high-degree nodes are challenging since multiple paths from the same node must be processed.To overcome this computational burden, GRAPE provides an approximated implementation of weighted RWs that undersamples the neighbors to scale with graphs containing nodes with high-degree, e.g. with millions of neighbors (Figure 3 a, b, c, Section 4.3.3).To guarantee scalability, the sampling process is performed by a novel algorithm (Sorted Unique Sub-Sampling -SUSS) that we developed as an alternative to the classic and computationally demanding alias algorithm (Supplementary Information Section 7.2.3).SUSS is a sampling algorithm that divides a discrete range into k uniformly spaced buckets and randomly samples a value from each bucket to achieve an efficient neighbourhood sub-sampling for nodes with a degree d k.The obtained values are inherently sorted and unique (see Section 4.3.3 for details).
We compared exact and approximated RW samples for Node2Vec-based SkipGram for edge prediction problem on the (unfiltered) H. sapiens STRING PPI network [38], achieving statistically equivalent performance (Two-sided Wilcoxon rank-sum p-value > 0.2, Fig. 3d), by running 30 holdouts, and setting a (deliberately low) degree threshold equal to 10 for the approximated RW, while the maximum degree in the training set ranged between 3325 and 4184 across the holdouts.These results show no relevant performance decay, even when using a relatively stringent degree threshold.We used the sk-2005 graph that includes about 50 millions of nodes and 1.8 billions of edges and some nodes with degrees over 8 million to better show that approximated RW can be several orders of magnitude faster than the "vanilla" exact RW algorithm.Indeed by extrapolating the results reported in Fig. 3e to the entire graph, the exact algorithm requires about 23 days, while the approximate one about 11 minutes, both running on a PC with two AMD EPYC 7662 64-core processors, 256 CPU threads, and 1TB RAM.

GRAPE enables a fair comparison of graph-based methods
GRAPE provides both a large set of ready-to-use graphs that can be used in the experiments and standardized pipelines to fairly compare different models and graph libraries, ensuring reproducibility of the results (Fig. 1 b -see section 4.7 for details).Graph embedding is efficiently implemented in Rust from scratch (with a Python interface) or is integrated from other libraries by implementing the interface methods of an abstract GRAPE class.GRAPE users can compare different embedding methods and prediction models and add their own methods to the standardized pipelines.Our experiments show how to use the standardized pipelines to fairly compare a large set of methods and different implementations using only a few lines of Python code.
Experimental comparison of node and edge embedding methods.We selected 16 among the 69 node embedding methods available in GRAPE, and we used the edge prediction and node-label standardized prediction pipelines to compare the prediction results obtained by Perceptron, Decision Tree, and Random Forest classifiers (Fig. 4).We used the Hadamard product for the edge prediction tasks to construct edge embeddings from node embeddings, i.e. the element-wise product of the source and destination nodes to obtain the embedding of the corresponding edge.We applied a "connected Monte Carlo" evaluation schema for edge prediction and a stratified Monte Carlo evaluation schema for node-label prediction (see Supplementary Information Section 10.2 for more details).
The models have been tested on 3 graphs for edge prediction (Fig. 4-a,b) and 3 graphs for node-label prediction (Fig. 4-c,d).The graph reports, describing the characteristics of the analyzed graphs, automatically generated with GRAPE, are available in the Supplementary Information Sections 3.2 and 3.3.Since they are homogeneous graphs (i.e.graphs having only one type on nodes and edges) we considered only homogeneous node embedding methods.Moreover, we discarded non-scalable models, e.g.models based on the factorization of dense adjacency matrices.
d. Triple-sampling methods: first and second order LINE [31].
All the embedding methods and classifiers are described in more detail in sections 4.2, 4.
Results show that no model is consistently better with respect to the others across the types of tasks and the data sets used in the experiments (Figure 4).These results are analogous to those obtained by Kadlec et al. [44] for the TransE model family and those obtained by Errica et al. [45] for GNN models, highlighting the need for objective pipelines to systematically compare a wide array of possible methods for a desired task.The standardized pipelines implementing the experiments are available from the online GRAPE tutorials and allow the full reproducibility of the results summarized in Fig. 4. Full results using other evaluation metrics are available in Supplementary Information Sections 5.1 and 5.2.

Scaling with big real-world graphs
To show that GRAPE can scale and boost edge prediction in big real-world graphs, we compared its node2vec-based models with state-of-the-art implementations on three big graphs: 1) English Wikipedia graph; 2) a graph constructed using the Comparative Toxicogenomic Database (CTD [34]); 3) A biomedical graph generated through PheKnowLator [35].Supplementary Information Section 6.1 reports details about the construction and the characteristics of the three graphs.
The embeddings computed by each of the tested models were used to train a Decision tree available from the Embiggen module of GRAPE for edge prediction.To perform an unbiased evaluation, training and tests were performed by 10 connected Monte Carlo holdouts (with a 80 : 20 train:test ratio -Supplementary Information Section 10.2) and performances were evaluated by precision, recall, accuracy, balanced accuracy, F1, AUROC, and AUPRC.In the experimental set-up we imposed the following memory and time constraints, using a Google Cloud VM with 64 cores and N1 Cpus with Intel Haswell micro-architecture: • A maximum time of 48 hours for each holdout to produce the embedding; • A 64GB maximum memory usage allowed during the embedding.
• A 256GB maximum memory usage allowed during the prediction phase.

Results on scaling tests
GRAPE can scale with big graphs when the other competing libraries fail.Most competing libraries could not complete the embedding and prediction tasks on big real-world graphs.Indeed NodeVectors exceeded the time computation limit, while SNAP, Node2Vec, GraphEmbedding, and PyTorch Geometric went out of memory in the embedding phase, exceeding the available RAM memory (64GB), while GRAPE only required 54M B with the CTD graph.For the first three libraries, this is due to the extremely high memory complexity required by the Alias method they use for pre-computing the transition probabilities (Supplementary Information Section 7.2.3);indeed the Alias method has quadratic complexity with respect to the number of nodes in the graph, therefore becoming quickly too expensive on big graphs.We also ran PyTorch Geometric on a substantially smaller graph (the STRING Homo sapiens graph, having about 20K nodes and 12M edges) and we registered that GRAPE is about 60 time faster than PyTorch Geometric.
Such comparison is impossible with the other three libraries employing the Alias method, as this smaller graph is still significantly larger than what is possible to handle with them.FastNode2Vec and PecanPy went out of time (more than 48h of computation) on the biggest Wikipedia graph.In practice, only GRAPE was able to successfully terminate the embedding and prediction tasks with all three big real-world graphs.
GRAPE improves upon the empirical time complexity of state-of-the-art libraries.Fig. 5 a, b and c show the memory and time requirements of GRAPE, FastNode2Vec and PecanPy (note that the other state-of-the-art libraries ran out of time or memory on these real-world graph prediction tasks.With CTD and PheKnowLator biomedical graphs we can observe a speed-up of about one order of magnitude (Fig. 5 a, b) of GRAPE with respect to both FastNode2Vec and PecanPy with also a significant gain in memory usage with respect to PecanPy and a comparable memory footprint with FastNode2Vec.These results are confirmed by the average memory and time requirements across ten holdouts (Fig. 5 d).Note that both FastNode2Vec and PecanPy fail with the Wikipedia task, while GRAPE was able to terminate the computation in a few hours using a reasonable amount of memory (Fig. 5 c and d).
GRAPE boosts edge prediction performance.GRAPE not only allows graph embedding approaches to be applied to graphs that are bigger than what was previously possible and enables fast and efficient computation, but can boost prediction performance on big real world graphs.Fig. 5-e and f show that GRAPE achieves better results on edge prediction tasks with both CTD and PheKnowLator biomedical graphs.GRAPE outperforms the other competing libraries at 0.01 significance level, where the win of a row against a column is in green, the tie in yellow, and the loss in red).
according to the Wilcoxon rank sum test (Fig. 5 g).The edge embeddings have been used to train a decision tree to allow a safe comparison between the embedding libraries.
Supplementary Information Section 6.4 reports AUROC, accuracy, and F1-score performances and other more detailed results about the experimental comparison of GRAPE with state of the art libraries.

Discussion
We have presented GRAPE, a software resource with specialized data structures, algorithms, and fast parallel implementations of graph processing methods coupled with efficient implementations of algorithms for RW-based GRL.Our experiments have shown that GRAPE significantly outperforms state of the art graph processing libraries in terms of empirical space and time complexity, with an improvement of up to several orders of magnitude for common RW-based analysis tasks.This allows substantially bigger graphs to be analyzed and may improve the performance of graph ML methods by allowing for more comprehensive training, as shown by our experiments performed on three real-world large graphs.In addition, the substantial reduction of the computational time achieved by GRAPE in common graph processing and learning tasks, will help to reduce the carbon footprint of ML researchers and graph processing and analyzing practitioners in several disciplines.
Thanks to (1) the huge number of well-known graphs that can be efficiently loaded and used via GRAPE, (2) the standard interfaces that allow any user to integrate their own GRL models into GRAPE, and (3) the modular pipeline that allows to easily design different benchmarking experiments, GRAPE can be considered as the first resource that truly allows a FAIR comparison between virtually any method and using any graph data (including graph data directly provided by the users).
Indeed, to our knowledge the only related resource that allows a related comparison is the OGB resource [16]; however as witnessed by the recent OGB-LSC (https://ogb.stanford.edu/neurips2022/),the datasets and the organization of the OGB resource are well-suited for specific large-scale challenges, while the GRAPE evaluation pipelines are useful to assess and compare any method on any graph benchmark chosen by any user.This makes the two resources related but complementary in their different purposes.
We would further remark that GRAPE currently provides efficient implementations of RW-based embeddings, whose advantage is their applicability to a larger set of learning problems since the computed embeddings are usually task-independent and unsupervised.Opposite to them, embeddings computed by GNNs are task-dependent and supervised, and their application to graphs with thousands of nodes and millions of edges is still hampered by GNN scalability issues that represent an open research question in literature.For this reason, future works will be aimed at investigating how to efficiently implement deep GNN to obtain deep neural models able to efficiently scale with very big graphs [15,21].More precisely, even if Elias-Fano based data structures and the SUSS algorithm proposed in this paper have been designed to efficiently implement RW embedding methods, in future research we plan to consider their integration in the context of GNNs.
Considering the ever-increasing amount of knowledge graphs being constructed in several disciplines, GRAPE may be considered as a powerful, effective, and efficient resource for advancing knowledge by performing graph-inference tasks to uncover hidden relationships between concepts, or to predict properties and discover structures in complex graphs.However, a limitation of the current implementation of GRAPE is the limited availability of algorithms specifically designed for the analysis of heterogeneous graphs, but we are already working to fill this gap.
GRAPE focuses primarily on CPU models, since most existing GPU VRAM are too small for several real-world graphs, leading to latency problems as data are moved back and forth between RAM and VRAM.Recently introduced top-tier GPU models provide VRAM which is considerably larger than previously available ones, potentially making it viable to translate the current CPU implementation into a GPU implementation.
Though GRAPE allowed to compare different experimental setups by composing experiments on different graphs, and by using several embedding methods and prediction models, no method systematically outranked other models (Section 2.4).To close this knowledge gap, in future work we plan to run GRAPE with a large scale grid-search to identify task-specific trends for the various combinations of models and their parameters.

Methods
GRAPE provides a wide spectrum of graph processing methods, implemented within the Ensmallen module, including node embedding methods (sections 4.2, 4.3, 4.4), methods to combine the node embeddings for obtaining edge embeddings (section 4.5), and models for node-label, edge-label and edge prediction (section 4.6), implemented within the Embiggen module.The graph processing methods include fast graph loading, multiple graph holdouts, efficient first and second-order RWs, triple and corrupted-triple sampling, plus a wide range of graph processing algorithms that nicely scale with big graphs, using parallel computation and efficient data structures to speed up the computation.
Ensmallen is implemented using Rust, with fully documented Python bindings.Rust is a compiled language gaining importance in the scientific community [18] thanks to its robustness, reliability, and speed.Rust allows threads and data parallelism to be exploited robustly and safely.To further improve efficiency, some core functionalities of the library, such as the generation of pseudo-random numbers and sampling procedures from a discrete distribution, are written in assembly (see Supplementary Information Sections 7.2.1 and 7.2.2).
GRAPE currently provides 50 unique node embedding models (69 considering redundant implementations, important for benchmarks), with 28 being "by-scratch" implementations and 41 integrated from third-party libraries.The list of available node embedding methods is constantly growing, with the ultimate goal to provide a complete set of efficient node embedding models.The input for the various models (e.g.RWs and triples) are provided by Ensmallen in a scalable, highly efficient, and parallel way (Fig. 1a).All models were designed according to the "composition over inheritance" paradigm, to ensure a better user experience through increased modularity and polymorphic behaviour [48].More specifically, Embiggen provides interfaces, specific for either the embedding or each of the prediction-tasks, that must be implemented by all models; third-party models, such as PyKeen [10], KarateClub [33] and Scikit-Learn [49] libraries, are already integrated within GRAPE by implementing these interfaces.GRAPE users can straightforwardly create their models and wrap them by implementing the appropriate interface.
GRAPE has a comprehensive test suite.However, to thoroughly test it against many scenarios, we also employed fuzzers, that is tools that iteratively generate inputs to find corner cases in the library.
In the next section we describe the succinct data structures used in the library and detail their efficient GRAPE implementation (Section 4.1).We then summarize the spectral and matrix factorization (Section 4.2), the RW-based (Section 4.3), the triple and corrupted triples-based (Section 4.4) embedding methods and their GRAPE implementation.In section 4.5 we describe the edge embedding methods and in Section 4.6 the node and edge label prediction methods available in GRAPE.Finally in Section 4.7 we detail the GRAPE standardized pipelines to evaluate and compare models for graph prediction tasks.

Succinct data structures for adjacency matrices
Besides heavy exploitation of parallelism, the second pillar of our efficient implementation is the careful design of the data structures for using as little memory as possible and quickly performing operations on them.The naive representation of graphs explicitly stores its adjacency matrix, with a O(|V | 2 ) time and memory complexity, being |V | the number of nodes, which leads to intractable memory costs on large graphs.However, since most large graphs are highly sparse, this problem can be mitigated by storing only the existing edges.Often, the adopted data structure is a Compressed Sparse Rows matrix (CSR [50]), which stores the source and destination indices of existing edges into two sorted vectors.In Ensmallen we further compressed the graph adjacency matrix by adopting the Elias-Fano succinct data scheme [46], to efficiently store the edges (Supplementary Information Section 7.1).Since Elias-Fano representation stores a sorted set of integers using memory close to the information-theoretical limit, we defined a bijective map from the graph-edge set and a sorted integer set.To define such encoding, we assigned a numerical id from a dense set to each node, and then we defined the encoding of an edge as the concatenation of the binary representations of the numerical ids of the source and destination nodes.This edge encoding has the appealing property of representing the neighbours of a node as a sequential and sorted set of numeric values, and can therefore be employed in the Elias-Fano data structure.
Elias-Fano has faster sequential access than random access (Supplementary Information Section 7.1.1)and is well suited for graph processing tasks such as retrieving neighbours during RW computation and executing negative sampling using the outbound or inbound node degrees scale-free distributions.GRAPE provides both CSR and Elias-Fano based data structures for graph representation to allow a time/memory complexity trade-off for processing large graphs.

Memory Complexity
Elias-Fano is a quasi-succinct data representation scheme, which provides a memory efficient storage of a monotone list of n sorted integers, bounded by u, by using at most EF(n, u) = 2n + n log 2 u n bits, which was proven to be less than half a bit per element away from optimality [46] and assures random access to data in average constant-time.Thus, when Elias-Fano is paired with the previously presented encoding, the final memory complexity to represent a graph this is asymptotically better than the O |E| log |V | 2 complexity of the CSR scheme.

Edge Encoding
Ensmallen converts all the edges of a graph G(V, E) into a sorted list of integers.Considering an edge e = (v, x) ∈ E connecting nodes v and x represented with, respectively, integers a and b, the binary representation of a and b are concatenated through the function φ k (a, b) to generate an integer index uniquely representing the edge e itself: This implementation is particularly fast because it requires only few bit-wise instructions: where << is the left bit-shift, | is the bit-wise OR and & is the bit-wise AND (see Supplementary Information Section 7.1.1for an example and an implementation of the encoding).Since the encoding uses 2k bits, it has the best performances when it fits into a CPU word, which is usually 64-bits on modern computers, meaning that the graph must have less than 2 32 nodes and and less than 2 64 edges.However, by using multi-word integers it can be easily extended to even larger graphs.

Operations on Elias-Fano.
The aforementioned encoding, when paired with Elias-Fano representation, allows an even more efficient computation of random-walk samples.Indeed, Elias-Fano representation allows performing rank and select operations by requiring on average constant time.These two operations were initially introduced by Jacobson to simulate operations on general trees, and were subsequently proven fundamental to support operations on data structures encoded through efficient schemes.In particular, given a set of integers S, Jacobson defined the rank and select operations as follows [19]: rank(S, m) returns the number of elements in S less or equal than m select(S, i) returns the i-th smallest value in S As explained below, to speed up computation, we deviate from this definition by defining the rank operation as the number of elements strictly lower than m.To compute the neighbours of a node using the rank and select operations, we observe that for every pair of nodes α, β with numerical ids a, b respectively, it holds that: Thus, the encoding of all the edges with source α will fall in the discrete range Thanks to our definition of the rank operation and the aforementioned property of the encoding, we can easily derive the computation of the degree d(a) of any node v with numerical id a for the set of encoded edges Γ of a given graph, which is equivalent to the number of outgoing edges from that node: Moreover, we can retrieve the encoding of all the edges Γ a starting from v encoded as a, by selecting every index value i falling in in the range [φ k (a, 0), φ k (a + 1, 0): We can then decode the numerical id of the destination nodes from Γ a , thus finally obtaining the set of numerical ids of the neighbours nodes N (a): In this way, by exploiting the above integer encoding of the graph and the Elias-Fano data scheme, we can efficiently compute the degree and neighbours of a node using rank and select operations.

Efficient implementation of Elias-Fano.
The performance and complexity of Elias-Fano heavily relies on the details of its implementation.In this section our implementation is sketched, to show how we obtain an average constant time complexity for rank and select operations.A more detailed explanation can be found in the Supplementary Information Section 7.1.
To this aim, it initially splits each value, y i , into a low-bits, l i , and a high-bits part, h i , where it can be proven that the optimal split between the high and low bits requires log 2 u n bits [19].The lower-bits are consecutively stored into a low-bits array L = [l 1 , ..., l n ], while the high-bits are stored in a bit-vector H = [h 1 , ..., h n ], by concatenating the inverted unary encoding, U, of the differences (gaps) between consecutive high-bits parts: We recall that the inverted unary encoding represents a non-negative integer, n, with n zeros followed by a one; as an example, 5 is represented by 000001 (see supplementary Figures 21 and 23 for a more detailed illustration of this scheme).
The rank and select operations on the Elias-Fano representation require two fundamental operations: finding the i-th 1 or 0 on a bit-vector.To perform them in an average constant time, having preset a quantum q, we build an index for the zeros, O 0 = [o 1 , ..., o k ], that stores the position of every q zeros, and an index for the ones, O 1 = [o 1 , ..., o k ], that similarly stores the position of every q ones.Thanks to the constructed index, when the i-th value v must be found, the scan can be started from a position, o j , for j = i q that is already near to the i-th v. Therefore, instead of scanning the whole high-bits array for each search, we only need to scan the high-bits array from position o j to position o j+1 .
It can be shown that such scans take an average constant time O(q) at a low expense of the memory complexity, since we need O n q log 2 n bits for storing the two indexes (Supplementary Information Section 7.1).Indeed, in our implementation we chose q = 1024 which provides good performance at the cost of a low memory overhead of 3.125% over the high-bits and, on average, for every select operation we need to scan 16 words of memory.

Available Data-structure Trade-offs
GRAPE offers a choice between two data structures, Compressed Sparse Row (CSR) and Elias-Fano, at compile time.The CSR data structure is the default option due to its speed and efficiency in handling common graph operations, such as exploring a node's neighbourhood.This structure stores the graph as an array of row pointers, column indices, and non-zero values, providing efficient access to the non-zero elements in sparse adjacency matrices.
On the other hand, Elias-Fano's succinct data structure is primarily effective for representing large graphs because, as mentioned earlier, it requires the least amount of memory without additional assumptions.The Elias-Fano structure is recommended in cases where the graph size is so big that memory conservation becomes crucial.
While GRAPE provides the option to choose between two data structures, an expert user can add and use any other graph data structure optimized for their specific task.

Spectral and matrix factorization embedding methods
Spectral and matrix factorization methods start by computing weighted adjacency matrices and may include one or more factorization steps.Secondly, given a target embedding dimensionality k, these models generally use as embeddings the k eigenvectors or singular vectors corresponding to spectral or singular values of interest.
A description of the spectral and matrix factorization methods implemented in GRAPE is reported in Supplementary Information Section 8.1.
GRAPE provides efficient parallel methods to compute the initial weighted adjacency matrix of the various implemented methods, which are computed either as dense or sparse matrices depending on how many non-zero values the metrics are expected to generate.The computation of the singular vectors and eigenvectors are currently computed using the state-of-the-art LAPACK library [51], though more scalable methods that compute the vectors using an implicit representation of the weighted matrices are currently under investigation.

First and second-order RW-based embedding methods
First-and second-order random-walk embedding models are shallow neural networks, generally composed of two layers and trained on batches of random-walk samples.Given a window size, these models learn some properties of the sliding windows on the RWs, such as the co-occurrence of two nodes in each window using Glove [28], the window central node given the other nodes in the window using CBOW [27], or vice-versa the nodes in the window from the window central node using Skipgram [27].The optimal window size value may vary considerably depending on the graph diameter and overall topology.Once the shallow model has been optimized, the weights in either the first or the second layer can be used as node embeddings.
An overview of the RW-based methods implemented in GRAPE is reported in Supplementary Information Section 8.2.
Efficient implementation of SkipGram and CBOW models.GRAPE provides both its own implementations and Keras-based implementations for all shallow neural network models (e.g.CBOW, SkipGram, TransE).Nevertheless, since shallow models allow for particularly efficient data-race aware and synchronization-free implementations [32], the "by-scratch" GRAPE implementations significantly outperform the Keras-based ones, as TensorFlow APIs are too coarse and high-level for such fine-grained optimizations.While GPU training is available for the TensorFlow models, their overhead with shallow models tends to be so relevant that "by-scratch" CPU implementations outperform those based on GPU.Moreover, the embedding of large graphs (such as Wikipedia) do not fit in most GPU hardware memory.Still, Keras-based models allow users to experiment with the open-software available in the literature for Keras, including, e.g., advanced optimizer and learning rate scheduling methods.SkipGram and CBOW models are trained using scale-free negative sampling, which is efficiently implemented using the Elias-Fano data structure rank method.
To obtain reliable embeddings, the training phase of the shallow model would need an exhaustive set of random-walk samples to be provided for each source node, so as to fully represent the source-node context.When dealing with big graphs, the computation of a proper amount of random-walk samples needs efficient routines to represent the graph into memory, retrieve and access the neighbors of each node, randomly sample an integer, and, in case of (Node2Vec) second-order RWs [29], compute the transition probabilities, which must be recomputed at each step of the walk.
The first-order RW is implemented using a SIMD routine for sampling integers (Supplementary Information Section 7.2.1).When the graph is weighted, another SIMD routine is used to compute the cumulative sum of the unnormalized probability distribution (Supplementary Information Section 7.2.2).The implementation of the second-order RW requires more sophisticated routines described in sections 4.3.1, and 4.3.2.Moreover, in section 4.3.3we present an approximated weighted and second-order RW that allows to deal with high-degree nodes.

Implementation of second-order RWs
Node2Vec is a second-order random-walk sampling method [29], whose peculiarity relies in the fact that the probability of stepping from one node v to its neighbours considers the preceding step of the walk (Supplementary Figure 27).More precisely, Node2Vec defines the un-normalized transition probability π vx of moving from v to any direct neighbor x, starting at a previous step from node t, as a function of the weight w vx on the edge connecting v and x (v, x), and a search bias α pq (t, x): The search bias α pq (t, x) is defined as a function of the distance d(t, x) between t and x, and two parameters p and q, called, respectively, the return and in-out parameters: If the return parameter p is small, the walk will be enforced to return to the preceding node; if p is large, the walk will otherwise be encouraged to visit new nodes.The in-out parameter q allows to vary smoothly between Breadth First Search (BFS) and Depth First Search (DFS) behaviour.Indeed, when q is small the walk will prefer outward nodes, thus mimicking DFS; it will otherwise prefer inward nodes emulating in this case BFS.Since α must be recomputed at each step of the walk, the algorithm to compute it must be carefully designed to guarantee scalability.
In GRAPE we sped up its computation by decomposing the search bias α pq (t, x) into the in-out bias β q (t, x), related to the q parameter, and the return bias γ p (t, x), related to p: where the two new biases are defined as: It is easy to see that eq. 2 is equivalent to eq. 1.
Efficient computation of the in-out and return bias.The in-out bias can be re-formulated to allow an efficient implementation: starting from an edge (t, v) we need to compute β q (t, x) for each x ∈ N (v), where N (v) is the set of nodes adjacent to v including node v itself.
otherwise This formulation (Supplementary Figure 26) allows us to compute in batch the set of nodes X β affected by the in-out parameter q: where N (v) are the direct neighbors of node v.In this way, the selection of the nodes X β affected by β q simply require to compute the difference of the two sets N (v) \ N (t).We efficiently compute X β by using a SIMD algorithm implemented in assembly, leveraging AVX2 instructions that work on node-set representations as sorted vectors of the indices of the nodes (see Supplementary Information Sections 7.2.1 and 7.2.2 for more details).The return bias γ p can be simplified as: It can be efficiently computed using a binary search for the node t in the sorted vector of neighbours.Summarizing, we re-formulated the transition probability π vx of a second-order RW in the following way: If p, q are equal to one, the biases can be simplified, so that we can avoid computing them.In general, depending on the values of p, q and on the type of the graph (weighted or unweighted), GRAPE provides eight specialized implementation of the Node2Vec algorithm, to significantly speed-up the computation (Supplementary Tables 50 and 51).GRAPE automatically selects and runs the specialized algorithm that corresponds to the choice of the parameters p, q and the graph type.This strategy allows a significant speed-up.For instance, in the base case (p = q = 1 and an unweighted graph) the specialized algorithm runs more than 100 times faster than the most complex one (p = 1, q = 1, weighted graph).Moreover, as expected, we observe that the major bottleneck is the computation of the in-out bias (Supplementary Table 51).

Efficient sampling for Node2Vec RWs
Sampling from a discrete probability distribution is a fundamental step for computing a RW and can be a significant bottleneck.Many graph libraries implementing the Node2Vec algorithm speed up sampling by using the Alias method (see Supplementary Information Section 7.2.3), which allows sampling in constant time from a discrete probability distribution with support of cardinality n, with a pre-processing phase that scales linearly with n.
The use of the Alias Method for Node2Vec incurs the "memory explosion problem" since the preprocessing phase for a second-order RW on a graph with |E| edges has a support whose cardinality is O eij ∈E deg (j) , where deg(j) is the degree of the destination node of the edge e ij ∈ E.
Therefore, the time and memory complexities needed for preprocessing make the Alias method impractical even on relatively small graphs.For instance, on the unfiltered Human STRING PPI graph (19.354 nodes and 5.879.727edges) it would require 777 GB of RAM.
To avoid this problem, we compute the distributions on the fly.For a given source node v, our sampling algorithm applies the following steps: 1. computation of the un-normalized transition probabilities to each neighbour of v according to the provided in-out and return biases; 2. computation of the un-normalized cumulative distribution, which is equivalent to a cumulative sum; 3. uniform sampling of a random value between 0 and the maximum value in the un-normalized cumulative distribution; 4. identification of the corresponding index through either a linear scan or a binary search, according to the degree of the node v.
To compute the cumulative sum efficiently, we implemented a SIMD routine that processes at once in CPU batches of 24 values.Moreover, when the length of the vector is smaller than 128, we apply a linear scan instead of a binary search because it is faster thanks to lower branching and better cache locality.Further details are available in the Supplementary Information Section 7.2.2.

Approximated RWs
Since the computational time complexity of the sampling algorithm for either weighted or second-order RWs scales linearly with the degree of the considered source node, computing an exact RW on graphs with high degree nodes (where "high" refers to nodes having an outbound degree larger than 10000) would be impractical, also considering that such nodes have a higher probability to be visited.
To cope with this problem, we designed an approximated RW algorithm, where each step of the walk considers only a sub-sampled set of k neighbors, where the parameter k is set to a value significantly lower than the maximum node degree.
An efficient neighborhood sub-sampling for nodes with degree greater than k requires to uniformly sample unique neighbors whose original order must be maintained.To uniformly sample distinct neighbors in a discrete range [0, n] we developed an algorithm (Sorted Unique Sub-Sampling -SUSS) that divides the range [0, n] into k uniformly spaced buckets and then randomly samples a value in each bucket.The implementation of the algorithm is reported in Supplementary Algorithm 1 (Supplementary Information Section 7.2.4).After splitting the range [0, . . ., n − 1] into k equal segments (buckets) with length delta/k , SUSS samples an integer from each bucket by using Xorshift random number generator.To establish whether the distribution of the integers sampled with SUSS is truly approximating an uniform distribution, we sampled n = 10.000.000integers over [0, . . ., 10.000], by using both SUSS and by drawing from a uniform distribution in [0, . . ., 10.000].We then used the one-sided Wilcoxon signed-rank test to compare the frequencies of the obtained indices and we obtained a p-value of 0.9428, meaning that there is not a statistically significant difference among the two distributions.Therefore, by using a time complexity Θ(k) and a spatial complexity Θ(k) SUSS produces reliable approximations of a uniform distribution.
The disadvantage of this sub-sampling approach is that two consecutive neighbors will never be selected in the same sub-sampled neighborhood.Nevertheless considering that the sub-sampling is repeated at each step of the walk, consecutive neighbors have the same probability of being selected in different sub-samplings.

Triple-sampling and corrupted triple sampling methods
Triple sampling methods are shallow neural networks trained on triples, (v, , s), where {v, s} is a node-pair composed of a source (v) and a destination node (s), and is a property of the edge (v, s) connecting them.Similar to triple sampling methods, corrupted -triple sampling methods are trained on the (true) triples (v, , s), but also on corrupted triples, that are obtained by corrupting the original triples by substituting the source and/or destination nodes {v, s} with randomly sampled nodes {v , s }, while maintaining the attribute unchanged (v , , s ).More details about triple-sampling and corrupted triple-sampling methods are available in Supplementary Information Section 8.3 and 8.4.
GRAPE provides a full implementation of first and second-order LINE triple sampling methods [31], as well as a Rust parallel implementation of the TransE corrupted triple sampling method [32].Moreover, a large set of corrupted-triple sampling models is integrated from the PyKeen library.The integrated models include TransH, DistMult, HolE, AutoSF, TransF, TorusE, DistMA, ProjE, ConvE, RESCAL, QuatE, TransD, ERMLP, CrossE, TuckER, TransR, PairRE, RotatE, ComplEx, and BoxE [10].We refer to each of the original papers for the extensive explanation.The parameters used for the evaluation of node embedding models in GRAPE pipelines are available in the Supplementary Information Section 4.1.

Edge embedding methods and graph visualization
GRAPE offers an extensive set of methods to compute edge embeddings from node embeddings (e.g.concatenation, average, cosine distance, L1, L2 and Hadamard operators [29]), and the choice of the specific edge-embedding operator is left to the user, who can set it through a parameter.To meet the various model requirements, the library provides three implementations of the edge embedding.
In the first one, all edge embedding methods are implemented as Keras/TensorFlow layers and may be employed in any Keras model.In the second one, all methods are also provided in a NumPy implementation.Finally, a third one uses Rust for models where performance is particularly relevant.For instance, the cosine similarity computation in the Rust implementation is over 250× faster than the analogous NumPy implementation.Whenever possible, the computation of edge embeddings is executed lazily for a given subset of the edges at a time since the amount of RAM required to explicitly rasterize the edge embedding can be prohibitive on most systems, depending on the edge set cardinality of the considered graph.More specifically, while the lazy generation of edge embeddings is possible during training for only a subset of the supported edge and edge-label prediction models, it is supported for all models during inference.
The library also comes equipped with tools to visualize the computed node and edge embedding and their properties, including edge weights, node degrees, connected components, node types and edge types.For example, in figure 1 c we display the node (left) and edge types (center) of the KG-COVID19 graph and whether sampled edges exist (right) by using the first two components of the t-SNE decomposition of the node/edge embeddings [52].
4.6 Node-label, edge-label, and edge prediction models GRAPE provides implementations to perform node-label prediction, edge-label prediction and edge prediction tasks.
All the models devoted to any of the three prediction tasks share the following implementation similarities.Firstly, they all implement the abstract classifier interface and therefore provide straightforward methods for training (fit) and inference (predict and predict proba).
real-world benchmark datasets [54], but also pipelines that could allow non-expert users to easily test and compare graphs and inference algorithms on the desired graphs.

FAIR graph retrieval
GRAPE facilitates FAIR access to an extensive set of graphs and related datasets, including both commonly used benchmark datasets and graphs actively used in biomedical research.Any of the available graphs can be retrieved and loaded with a single line of Python code (Fig. 1 b.), and their list is constantly expanding, thanks to the generous contributions of GRAPE users.The list of resources currently supported can be found at Supplementary Information Section 3.1.
Findability and Accessibility.Datasets may change locations, versions may appear in more than one location, and file formats may change.Using an ensemble of custom web scrapers, we collect, curate and normalize the most up-to-date datasets from an extensive resources list (currently over 80, 000 graphs).The collected metadata is shipped with each GRAPE release, ensuring end-users can always find and immediately access any available version of the provided datasets.
Interoperability.The graph retrieval phase contains steps that robustly convert data from (even malformed) datasets into general-use TSV documents that, while primarily used as graph data, can be used for any desired application case.
Reusability.Once loaded, the graphs can be arbitrarily processed and combined, used with any of the many embedding and classifier models from either the GRAPE library or any third-party model integrated in GRAPE by implementing the interface described in section 4.7.2.

FAIR evaluation pipelines
GRAPE provides pipelines for evaluating node-label, edge-label and edge prediction experiments trained on user-defined embedding features and by using task-specific evaluation schemas.
In particular, the evaluation schemas for edge prediction models are K-fold cross-validations, Monte Carlo, and Connected Monte Carlo (Monte Carlo designed to avoid the introduction of new connected components in the training graph) holdouts.All of the edge prediction evaluation schemas may sample the edges in a uniform or stratified way, with respect to a provided list of edge-types.Sampling of negative (non-existing) edges may be executed by either following a uniform or a scale-free distribution.Furthermore, the edge-prediction evaluation may be performed by using varying unbalance ratios (between existent and non-existent edges) to better gauge the true-negative rate (specificity) and false-positive rate (fall-out).Stratified Kfold and stratified Monte Carlo holdouts are also provided for node and edge-label prediction models.
For all tasks, an exhaustive set of evaluation metrics are computed, including AUROC, AUPRC, Balanced Accuracy, Miss-rate, Diagnostic odds ratio, Markedness, Matthews correlation coefficient and many others.
All the implemented pipelines have integrated support for differential caching, storing the results of every step of the specific experiment, and for "smoke tests", i.e. for running a lightweight version of the experimental setup with minimal requirements to ensure execution until completion before running the full experiment.
The pipelines can use any model implementing a standard interface we developed.The interface requires the model to implement methods for training (fit or fit transform), inference (predict and predict proba) plus additional metadata methods (e.g., whether to use node types, edge types, and others) which are used to identify experimental flaws and biases.As an example, in an edge-label prediction task using node embeddings, GRAPE will use the provided metadata to check whether the selected node embedding method also uses edge labels.If so, the node embedding will be recomputed during each holdout.Conversely, if the edge labels are not used in the node embedding method, it may be computed only once.The choice to recompute the node embedding for each holdout, which may be helpful to gauge how much different random seeds change the performance, is left to the user in this latter case.
To configure one of the comparative pipelines, users have to import the desired pipeline from the GRAPE library and specify the following modular elements: Graphs The graphs to evaluate, which can be either graph objects or strings matching the names from graphs retrieval.

Graph normalization callback
For some graphs it is necessary to execute normalization steps and filtering, such as the STRING graphs which can e.g. to be filtered at 700 minimum edge weight.
For this reason, users can provide this optional normalization callback.
Classifier models The classifier models to evaluate, which can either be a model implemented in GRAPE or custom models implementing the proper interface.
Node, node type, and edge features The features to be used to train the provided classifier models.These features can be node embedding models, either implemented in GRAPE or custom embedding models implementing the node embedding interface.
Evaluation schema The evaluation schema to follow for the evaluation.
Given any input graph, each pipeline starts by retrieving it (if the name of the graph was provided) and validating the provided features (checking for NaNs, constant columns, compatibility with the provided graphs); next, and if requested by the user, it computes all the node-embeddings to be used as additional features for the prediction task.Once this preliminary phase is completed, the pipeline starts to iterate and generate holdouts following the provided evaluation schema.
For each holdout, GRAPE then computes the node embeddings required to perform the prediction task (such as topological node embeddings for a node-label prediction task, or topological node embeddings followed by their combination through a user-defined edge embedding operator -see Section 4.5to obtain the edge embedding in an edge-prediction task), so that a new instance of the provided classifier models can be fitted and evaluated (by using both the required embedding and, eventually, the additional, label-independent, features computed in the preliminary phase).The classifier evaluation is finally performed by computing an exhaustive set of metrics including AUROC, AUPRC, Balanced Accuracy, Miss-rate, Diagnostic odds ratio, Markedness, Matthews correlation coefficient and many others.
Interfaces are made available for embedding models, node-label prediction, edge-label prediction, and edge prediction.All models available in GRAPE implement these interfaces, and they can be used as starting points for custom integrations.

Figure 1 :
Figure 1: Schematic diagram of GRAPE (Ensmallen + Embiggen) functionalities.a. High level structure of the GRAPE software resource.b.Pipelines for an easy, fair, and reproducible comparison of graph embedding techniques, graph-processing methods, and libraries.c.Visualization of KGCOVID19 graph [17], obtained by displaying the first two components of the t-SNE decomposition of the embeddings computed by using a Node2Vec SkipGram model that ignores the node and edge type during the computation.The clusters' colors indicate: (left) the Biolink category for each node; (center) the Biolink category for each edge; (right) the predicted edge existence..

Figure 2 :
Figure 2: Experimental comparison of GRAPE with state-of-the-art graph processing libraries across 44 graphs.Panels a and b-graph loading: a. Empirical execution time.b.Peak memory usage.The horizontal axis shows the number of edges, and the vertical axis shows peak memory usage.Panels c and d -first-order RW: c. Empirical execution time.d.Peak memory usage.Panels e and f -second order RW: e. Empirical execution time.f.Peak memory usage.The horizontal axis shows the number of nodes, and the vertical axis, respectively, execution time (c,e) and memory usage (d,f).All axes are on a logarithmic scale.The × represents when a library crashes, exceeds 200GB of memory or takes more than 4 hours to execute the task.Each line corresponds to a graph resource/library, and points on the lines refer to the 44 graphs used in the experimental comparison.

Figure 3 :
Figure 3: Approximated RW. a.The RW starts at node src; its 15 neighbourhood nodes are highlighted in cyan.b.We sample d T = 5 destination nodes (d T being the degree threshold) from the available 15 destinations, using our Sorted Unique Sub-Sampling algorithm (SUSS, Section 4.3.3),and performs a random step (edge highlighted with an arrow).c.A further step will be performed on the successor node (that now becomes the novel source node src), and the same process is repeated until the end of the walk.d.Edge prediction performance comparison (Accuracy, AUPRC, F1 Score, and AUROC computed over n = 10 holdouts -data are presented as mean values +/− SD) using Skipgram-based embeddings and RW samples obtained with exact and approximated RWs for both the training and the test set with the STRING-PPI dataset.Bar plots are zoomed-in at 0.9 to 1.0, with error bars representing the standard deviation, computed over 30 holdouts.e. Empirical time comparison (in msec) of the approximated and exact second-order RW algorithm on the graph sk-2005 [39]: 100 − steps RWs are run on 100 randomly selected nodes.Error bars represent the standard deviation across n = 10 repetitions.Time is on a logarithmic scale.Data are presented as mean values +/− SD..

Figure 4 :
Figure 4: Comparison of embedding methods through the GRAPE pipelines: edge and node label prediction results.Results represent the mean balanced accuracy computed across n = 10 holdouts +/− SD (results using other evaluation metrics are available in Supplementary Information Section 5).We sorted the embedding models by performance for each task; methods directly implemented in GRAPE are in purple, while integrated methods are in cyan.(a, b): Edge prediction results obtained through a Perceptron (a) and a Decision tree (b).Barplots from left to right show the balanced accuracy results obtained with the Human Phenotype Ontology (left), STRING Homo sapiens (center) and STRING Mus musculus (right).(c, d): Node-label prediction results obtained through a Random Forest (c) and a Decision Tree (d).Barplots from left to right show the balanced accuracy respectively achieved with CiteSeer (left), Cora (center) and PubMed Diabetes (right) datasets.

Figure 5 :
Figure 5: Performance comparison between GRAPE and state-of-the-art implementations of Node2Vec on real-world big graphs.GRAPE implementations achieve substantially better empirical time complexity: (a.), (b.) and (c.) show the worst performance (maximum time and memory, denoised using a Savitzky-Golay filter) over 10 holdouts on CTD, PheKnowLator and Wikipedia, respectively.In a. and b. the rectangles in the left figure are magnified in the right figure to highlight GRAPE performances.In the Wikipedia plot (c.)only GRAPE results are available as the others either go out-of-time or out-of-memory.(d.)Average memory and computational time across n = 10 holdouts; data are presented as mean values +/− SD. (e.) AUPRC and (f.) AUROC results of Decision Trees trained with different graph embedding libraries -data are presented as mean values +/-SD computed over n = 10 holdouts: GRAPE embedding achieve better edge prediction performance than those obtained by the other libraries.(g.) One-sided Wilcoxon signed-rank tests results (p-values) between GRAPE and the other state-of-the-art libraries,where the win of a row against a column is in green, the tie in yellow, and the loss in red).
Many usage examples are available in the library tutorials: https://github.com/AnacletoLAB/grape/tree/main/tutorials. in Supplementary Information Section 6.The procedures for the construction of train and test graphs for edge prediction are detailed in Supplementary Information Section 10.2.Source Data for Figures 2, 3, 4, and 5 are available through this manuscript.