TumorMet: A repository of tumor metabolic networks derived from context-specific Genome-Scale Metabolic Models

Studies about the metabolic alterations during tumorigenesis have increased our knowledge of the underlying mechanisms and consequences, which are important for diagnostic and therapeutic investigations. In this scenario and in the era of systems biology, metabolic networks have become a powerful tool to unravel the complexity of the cancer metabolic machinery and the heterogeneity of this disease. Here, we present TumorMet, a repository of tumor metabolic networks extracted from context-specific Genome-Scale Metabolic Models, as a benchmark for graph machine learning algorithms and network analyses. This repository has an extended scope for use in graph classification, clustering, community detection, and graph embedding studies. Along with the data, we developed and provided Met2Graph, an R package for creating three different types of metabolic graphs, depending on the desired nodes and edges: Metabolites-, Enzymes-, and Reactions-based graphs. This package allows the easy generation of datasets for downstream analysis.

www.nature.com/scientificdata www.nature.com/scientificdata/ while dealing with their structural and relational complexity 4 . In the context of findability, accessibility, interoperability, and reusability (FAIR) principles 8 , providing benchmark datasets for comparing novel approaches and for the general advancement of a specific research domain is extremely important. Graph-structured data coupled with machine learning approaches are receiving growing interest [9][10][11][12][13] , and many benchmark datasets have been proposed in the context of biomedical graphs, especially derived from protein-protein interaction, chemical, imaging data [14][15][16][17][18] . To the best of our knowledge, metabolic networks based on context-and patient-specific metabolic models have not been provided so far. To fill this gap, here, we provide the TumorMet repository. TumorMet contains two main sets of networks depending on the models from which they derive: Tissue-derived networks generated starting from tissue-specific models and PDGSMMs-derived networks obtained using Patient-Derived Genome-Scale Metabolic Models (PDGSMMs). The interesting implications of using the metabolic networks are twofold, from both a computational and biological perspective. Their complexity in terms of nodes and connections, and the plasticity given by the multiple ways in which they can be generated, make them appealing for the proposal and validation of novel approaches in the context of computational graph-based research. In this work, we presented three alternatives, each focused on a specific set of metabolic players (i.e., metabolites, enzymes, and reactions). As demonstrated by 19 , reconstruction algorithms used to generate context-specific models present a bug which determines an underestimation of the molecular context. The model's conversion into a network allows further contextualization by integrating context-specific data. Being aware that the networks we generated for TumorMet are just a portion of the possibilities, we provided the Met2Graph package to give the user the freedom to build the networks depending on specific needs. Met2Graph indeed implements a flexible process flow to build the metabolic graphs, can be easily integrated with user-customized functions, and provides several arguments to personalize the networks. Some of the networks in this dataset were used for assessing graphs classification, clustering, and embedding [20][21][22][23] , as well as for multimodal data analysis 24,25 , demonstrating their benefits. An exciting field of biological network usage is also represented by the application of node classification approaches aimed at predicting the essential genes, namely those genes crucial for an organism's viability. Usually, the Protein-Protein Interaction (PPI) networks are exploited to this extent, based on the assumption that the topological centrality is correlated to a functional centrality. As hypothesized in 26 , one of the reasons why the PPI are the most used networks for this purpose could be their abundance compared to the other types, such as Metabolic networks, highlighting the importance of providing network datasets. Still, only physical interactions, additionally not contextualized, are insufficient to represent the genetic connections' complexity 27 . Modern biology extensively uses networks to integrate and analyze data in a way in which organisms, tissues, or cells are considered systems. This perspective gives a crucial role to the connections among biological components, and the network-based analyses are exploited for making relevant biological inferences. The central role of metabolism in different aspects of pathophysiological mechanisms and their tune regulation make these networks particularly interesting for extracting knowledge and making predictions. For example, the analysis of hub nodes 28 and the comparison of topological properties between different context-specific networks 29 are valuable resources in diagnostic and prognostic markers investigation for precision medicine. Along with the data, we also provide an R package, Met2Graph, to create metabolic graphs starting from GSMs and gene expression data. The package can generate three types of graphs, depending on the desired nodes and edges: Metabolites-based graphs, where metabolites are nodes connected by reactant-product relationships and the edges can be weighted by expression values of the enzymes catalyzing the corresponding reactions; Enzymes-based graphs, where enzymes are nodes that are connected if they catalyze two reactions, each producing and consuming a specific metabolite; and Reactions-based graphs, with reactions as nodes connected if the metabolite produced by one is consumed by the other. TumorMet is deposited at figshare repository 30 and the Met2Graph package used to generate it is available at the Met2Graph Github repository (https://github.com/cds-group/Met2Graph).

Methods
The metabolism involves several players, and focusing on one or another influences the type of analysis and the knowledge that can be extracted. The metabolites and the enzymes represent the main molecular components. A biochemical reaction is a transformation process that uses/consumes some metabolites (reactants) to produce new ones (products). The enzymes can facilitate these transformations as they are particular proteins having catalytic activity and the ability to speed up the rate of a reaction binding the substrate by a lock-key or induced-fit model. Not all the reactions are catalyzed by enzymes, as some of them can occur spontaneously. The enzymes are selective; this means that one binds specifically one or few substrates and, consequently, can catalyze one or more reactions, while the same reaction can be catalyzed by more enzymes acting as complex or as mutually exclusive catalyzers. This information is crucial in defining the rules to design a metabolic network since the connections between the metabolic players can be multiple and of different nature when involving the enzymes. In order to manage this issue, we defined some simplification strategies when enzymes represent edges and give rise to multiple connections (as in the case of Metabolites-based networks) and a different consideration of complex and mutually exclusive relationships when enzymes represent the nodes (as in the case of Enzymes-based networks). Further details are provided below in the network construction sections. The repository we provide contains different types of metabolic networks, depending on the nodes and the rules behind the connections: Metabolites-, Enzymes-and Reactions-based networks. A graphical overview of the metabolic networks construction is provided in Fig. 1.
Gene expression data. Gene expression data from 6 different tumor primary sites were used to create context-specific Metabolites-based metabolic networks. FPKM (fragments per kilobase per million reads mapped) normalized and log-transformed read counts from RNA sequencing experiments of the breast (TCGA-BRCA), lung (TCGA-LUAD and TCGA-LUSC), kidney (TCGA-KIRC and TCGA-KIRP), brain (TCGA-GBM and TCGA-LGG), ovary (TCGA-OV), and prostate (TCGA-PRAD) cancers were obtained from the Genomic Data Commons (GDC) data portal (https://portal.gdc.cancer.gov). GDC includes several cancer projects, among which The Cancer Genome Atlas (TCGA), which we selected to download the data. Each of them represents a dataset of the repository. Clinical annotations of the samples were also extracted from the database and included in each dataset as sample-sheets.
Metabolites-based_tissue networks construction. The metabolites are the nodes of the network, labeled by the corresponding ID, connected if they are involved in the same reaction, one as a reactant and one as a product. The connections have been created using the information from the relative context-specific metabolic model. Recurrent metabolites (e.g, ATP, CO2, H2O) have been removed to avoid redundant connections and unrealistic definition of paths 34 . The small molecules such as H2O, NH3, O2, CO2, phosphate, and cofactors are generally considered recurrent metabolites. The recurrent metabolites list we used is provided as external data of the package Met2Graph; the argument rmMets can be set to FALSE to avoid removal, or the list can obviously be personalized by the user. The GPR associations have been derived from the generic human GSM. Each edge is labeled by the Ensembl stable ID (in the form of ENS[species prefix][feature type prefix][a unique eleven-digit number]) of the enzyme/s catalyzing the reaction, when present, and weighted by the expression value/s of the corresponding gene/s obtained by the GDC Portal. Each resulting graph corresponds to a specific sample of Fig. 1 Overview of the Metabolic networks construction. The context-specific GSMs used in this study derive from the human generic GSM through the integration of tissue-specific multi-omics data (tissue-specific GSMs from Human Metabolic Atlas) or by integration of TCGA transcriptomics data (PDGSMMs from Biomodels). The context-specific GSMs carrying information about biochemical reactions are the input to create the context-specific metabolic networks of the TumorMet repository. Metabolites-based_tissue networks are generated by integrating TCGA gene/enzyme-expression data into the tissue-specific GSMs to weight the edges represented by enzymes connecting two metabolites. Networks of different patients have the same structure with different edge weights depending on patient expression profile. Enzymes-, Reactions and Metabolites-based_PDGSMMs networks are created from PDGSMMs and have enzymes/reactions as nodes connected by metabolites or metabolites as nodes connected by enzymes. Networks of different patients have different structures and no weights.
www.nature.com/scientificdata www.nature.com/scientificdata/ the GDC tumor dataset considered. These rules create graphs where a couple of nodes can have multiple edges since multiple enzymes are involved in the same reaction and/or because the same nodes pair can be present in different reactions. Multiple edges have been simplified by averaging the expression values of enzymes acting in the same reaction and then summing up these averages corresponding to different reactions with the same nodes pair. Thus, all the graphs resulting from the same metabolic model have the same number of nodes and edges but different edge weights. The networks are then personalized for each patient by using the expression values and as a consequence, the gene context mentioned by 19 is met. Based on the rules defining the edges, these networks are directed. The properties of these networks are summarized in Table 1.

Metabolites-based_pDGSMMs networks construction. The logic behind the generation of
Metabolites-based_PDGSMMs networks is the same as that of the networks derived from tissue models described in the previous paragraph, with the only difference that here each patient-specific network is derived from the corresponding PDGSMM downloaded from the BioModels repository. The edges are weighted using the patient's gene expression data from the GDC repository. Therefore, each patient-specific network has a different structure and different edge weights. These graphs are directed and weighted. The properties of these networks are summarized in Table 2a.
Enzymes-based_pDGSMMs networks construction. These networks have enzymes as nodes connected if one catalyzes a reaction producing a metabolite consumed in a reaction catalyzed by the other. The recurring metabolites have also here been removed. According to the GPR, the enzymes involved in each reaction are associated by AND or OR logical relationship, indicating an enzymatic complex or an alternative activity, respectively. Based on this, enzymes related by AND have been considered as a single node, while OR relationships have been split into different nodes. To create patient-specific networks, PDGSMMs have been used as starting models for Metabolites-, Enzymes-, and Reactions-based_PDGSMMs datasets and downloaded from the BioModels repository. Each sample graph has then a different structure deriving from a different model. These graphs are directed and not weighted. The properties of these networks are summarized in Table 2b.

Reactions-based_pDGSMMs networks construction.
The rules behind these networks are similar to those of Enzymes-based networks, with the difference of having reactions as nodes, connected if one produces a metabolite consumed by the other. Recurring metabolites have been removed as well. To have sample-specific graphs also in this case we used the PDGSMMs from Biomodels. The resulting graphs are unweighted and directed, and each sample has a different structure determined by the different starting models. The properties of these networks are summarized in Table 2c.
Simplified networks construction. Given the complexity and the size of these networks, we also provided a set of Metabolites-based sub-networks of a subset of kidney and lung samples, simplified according to the approach described in 21 . Briefly, central nodes have been selected by the Eigen centrality score, a measure describing the importance of a node in a graph that depends on that of its neighbors. The classification tests performed to demonstrate the reliability of these sub-networks compared to the whole networks gave comparable accuracy results (see Tables 3 and 4 in 21 ). For each tissue, two sets of networks with a different number (#) of resulting nodes are provided. The properties of these networks, forming the Simpl-Kidney-# and Simpl-Lung-# datasets, are summarized in Tables 3 and 4. classification. Metabolites-based_tissue datasets. In previous works, we have demonstrated the utility of the network datasets in classification and clustering tasks using subsets of some of the Metabolites-based graph www.nature.com/scientificdata www.nature.com/scientificdata/ datasets now included in the TumorMet repository 20,21,[35][36][37] . Here, we extend to the entire repository the usage validation introduced in 20 , wherein we classify whole graphs sharing the same set of nodes. The basic idea is to 1) represent each graph of a dataset using probability distributions describing the topological properties of each node; 2) extract the distance matrix (Gram matrix), i.e., the symmetric square matrix containing the distances, taken pairwise, between the networks of the dataset; and 3) classify the networks based on the obtained distance vectors.
1. Based on the performance results achieved in 20,21,[35][36][37] , here we selected the Transition Matrix of order one r T for representing each graph r G , whose generic element T i j r , is the probability of a node i to be reached in one step by a random walker located in node j. Each row T i r of this matrix includes local information on the connectivity of node i. 2. For computing the distance between two networks G p and G q , we selected the network distance:  Table 2. For each tissue dataset of the Metabolites-(a), Enzymes-(b), and Reactions-based_PDGSMMs (c) networks (along the columns), we report the number of graphs (first row) and the corresponding networks topological properties, such as the number of vertices and edges, edge density, average network degree, eventual presence of edge weights, assortativity degree, global transitivity, average local transitivity, minimum and maximum diameter (second through and eleventh rows). Observe that each network derived from PDGSMMs and corresponding to each patient sample has a different structure since the starting models are patient-specific (see Paragraphs on Metabolites-, Enzymes-, and Reactions-based PDGSMM networks). Therefore, values for network properties are reported as average ± standard deviation across all the networks of each dataset.
www.nature.com/scientificdata www.nature.com/scientificdata/ obtained by averaging over all the l graph nodes the Jensen-Shannon distances d JS of the probability distributions of their nodes 38 . 3. For classification, we considered the primary tumor classes described in Table 6. In particular, for Kidney, Lung, and Brain, the Primary-Tumor diagnoses indicated in the GDC sample metadata file, downloaded along with the gene expression files, have been used to label the samples and fulfill the classification task. For Breast, the 5 subtypes have been derived from the PAM50 classification 39 . As the Normal-like subtype has only 40 samples and is very similar to the Luminal A subtype, we performed the tests both including (Breast_5cl) and excluding (Breast_4cl) this class. For Prostate, as having only one class of diagnosis, the Gleason pattern score, an indicator of different grades of malignancy, has been used. Among the possible four classes (Pattern from 2 to 5), we excluded the Pattern 2 class (not shown in Table 6), as it is made of only one sample. Moreover, we considered two different classification problems: the Prostate1 case, that aims at discriminating the Pattern 3 samples (199) from the Pattern 4 ones (249); and the Prostate2 case, that consists in discriminating the Pattern 3 samples from the samples being assigned to Pattern either 4 or 5 (289). For Ovary, the subtype assignment of High-Grade Serous Ovarian Cancer (HGSOC) has been taken from 40 .

Metabolites-, Enzymes-, and Reactions-based_PDGSMMs datasets.
The graph2vec framework 41 is a neural method for learning graph-level embeddings in an unsupervised manner. It describes nodes through a recursive node relabeling algorithm assigning to each node a label uniquely representing its rooted subgraph (neighborhood). These labels form a vocabulary of words, and graphs are represented in the form of documents. Then, the Distributed Bag of Words doc2vec approach 42 is used to learn the graph (document) embeddings. The performance has been evaluated by means of a stratified 10-fold Cross-Validation (CV) in which a SVM classifier, with  www.nature.com/scientificdata www.nature.com/scientificdata/ a linear kernel, was applied to train and make predictions on 64-sized vectorizations of graphs (embeddings) produced by graph2vec with a recursive depth of 3 and a training duration of 200 epochs. The class labels used for the classification task are specified in Table 5.

Data Records
The network files and associated metadata composing the repository TumorMet are available at figshare repository 30 . The file TumorMet-repository.pdf summarizes the content of the repository. For easy access to the files, the repository is organized into seven datasets, each in a separate folder, representing the six tumor tissues and the simplified networks (i.e., Prostate, Lung, Kidney, Breast, Ovary, Brain, and Simplified networks). In each main tissue dataset folder, the sample-sheet file reporting the sample metadata as downloaded from GDC (i.e. Sample sheet.tsv) and an excel file reporting the correspondences between PDGSMM ids and TCGA ids (Dictionary_ids.xlsx) are provided. Each tissue dataset folder contains subfolders for the different types of networks, namely Metabolites-, Enzymes-, and Reactions-based, compressed in.zip format. The Metabolites-based folder is further subdivided into folders containing the Metabolites-based networks deriving from tissue models (Metabolites-based_tissue) and BioModels PDGSMMs (Metabolites-based_PDGSMMs). Enzymes-and Reactions-based networks are only derived from PDGSMMs. Simplified networks are provided for Kidney and Lung tissues. Each tissue folder contains the sample-sheet file reporting the sample metadata as downloaded from GDC (i.e., Sample sheet.tsv) and two subfolders for the networks files based on the number of nodes retained after the simplification process (for Kidney eigen_simplified_441_nodes and eigen_simplified_1034_nodes; for Lung eigen_ simplified_312_nodes and eigen_simplified_1017_nodes). All the network files are provided in GraphML format. GraphML is a flexible and convenient XML format for storing network information. It supports unweighted, weighted, undirected, and directed networks and allows for the definition of node and edge attributes (http://graphml.graphdrawing.org/). A scheme of the repository content is illustrated in Fig. 2, while a summary of the networks features in terms of starting material and number of networks is provided in Table 6.

technical Validation
Our validation process consisted of data-type and structural validation, as well as usage validation through downstream applications.
Data-type and structural validation. The quality of the original data used to generate the networks is given by the reliability of the data sources repositories, i.e., GDC, Human Metabolic Atlas, and BioModels. Node IDs were verified to be of the same type. All edges were verified to be between nodes in the node list. All attribute data were verified to correspond to an existing node or edge. The structural integrity of the networks has been   www.nature.com/scientificdata www.nature.com/scientificdata/ assessed by removing self-loops. Any duplicate edges were also removed. We further checked that nodes with no edges were not present in the networks.
Usage validation. The tumor metabolic networks can be exploited in several downstream applications, ranging from pure network analysis to multi-level integration with other biological networks or data, to machine and deep learning approaches for unraveling the complex metabolic machinery and its role in precision medicine. In this section, we show the usage of TumorMet networks in classification of tumor samples, thus giving an idea of one of their potential applications. To furnish a baseline for comparing methods and approaches, we give several details of the two different workflows used for Metabolites-based networks derived from tissue models and Metabolites-, Enzymes-, Reactions-based networks derived from PDGSMMs.

Metabolites-based_tissue datasets.
For the evaluation of classification performance, i) each of the Metabolites-based datasets was subdivided into a training and a test set; ii) a statistical validation was obtained on the training sets using a 10-fold CV, to ensure that the results were not biased to a specific training subset; iii) finally, the classification performance on the test datasets was evaluated using the models built on the training datasets.
i). In the case of Kidney, Lung, Breast, and Brain tissue datasets, the choice of the training sets was driven by our previous work 36 , where subsets of these datasets were already adopted for classification. Therefore, those subsets have been adopted here as training sets, while the newly added samples were assigned to the test sets. For the tissues not used previously (Ovary and Prostate), we obtained the training and test sets by using a 70:30 split ratio. The sample partitioning for each tissue is reported in Supplementary Table 1, while  Figs. 3-4 provide the t-distributed Stochastic Neighbor Embedding (t-SNE) plots for the test sets. ii). For the statistical validation on the training sets, the data were min-max normalized and a Support Vector Machine (SVM) classifier with linear kernel was adopted using the libsvm implementation 43 available in scikit-learn 44 . The one-vs.-rest strategy was used to classify the multi-class datasets. To account for unbalanced datasets, the "balanced" mode in sklearn was used to set the class weights; this parameter penalizes the wrong prediction of the classes having a number of instances lower than the others. The 10-fold CV on  Table 6. Networks provided in the TumorMet repository. For each tumor tissue: the type of networks, the data used to generate the networks in terms of metabolic models and Gene Expression (GE) data from TCGA projects, and the number of networks, eventually subdivided by TCGA project ID. Observe that in the case of PDGSMMs derived networks, only for Metabolites-based_PDGSMM networks the GE data have been used to weight the edges.
www.nature.com/scientificdata www.nature.com/scientificdata/ the training datasets was repeated 10 times, and the average of the CV scores are reported in Table 9 (top); these scores are also shown in the form of box plots in Fig. 5. iii). The classification performance on the test sets was computed using the same SVM classifier learned on the training sets. The obtained results are reported in Table 9 (bottom). Kidney, Lung and Brain graphs are well classified, as shown by accuracy scores both in CV on training sets and using new samples as testing data (Table 9 and Figs. 3, 5). More challenging tasks are instead given by the classification of Breast, Ovary and Prostate samples.
Regarding Breast, the inclusion of the Normal-like subtype into the classification does not dramatically change the results; however, compared to the tissues mentioned above, the results are worse, having an accuracy of around 80%. Looking at the t-SNE plots (Fig. 4a,b), it is evident how the Basal is the best discriminated and most homogeneous subtype, while some samples of Luminal A, Luminal B, and Her2 are overlapped, especially the latter two. Normal-like samples, as expected, are difficult to separate from Luminal A ones. Ovary samples are completely overlapping (Fig. 3d) and lead to poor accuracy percentage (around 70%, as reported in Table 9). Finally, the CV scores reported in Table 9 (top) and plotted in Fig. 5c, as well as the test samples validation results reported in Table 9 (bottom), indicate that Prostate samples are generally poorly discriminated and the results are slightly better for the Prostate2 classification task (when the Gleason Pattern 5 is assimilated to Pattern 4). Prostate cancer is characterized by a high molecular heterogeneity 45 which is evidently not caught considering only the Gleason score, as also highlighted by the t-SNE plots reported in Fig. 4c,d. www.nature.com/scientificdata www.nature.com/scientificdata/ Metabolites-, Enzymes-, Reactions-based_PDGSMMs datasets. As detailed in the Section on Metabolic networks construction, these PDGSMMs derived graphs differ from the Metabolites-based graphs in that they do not share a common set of nodes across all patients. Therefore, we decided to accomplish the classification task on these datasets through a whole-graph embedding framework. Classification results based on these embeddings using the class labels specified in Table 5 for the Kidney and Lung PDGSMMs derived network datasets are reported in Table 8.
It is evident that the performance for these types of networks is not as good as the one obtained with Metabolites-based graphs, but it is worth pointing out that the two approaches to the classification task are completely different due to the different nature of the networks. Enzymes-and Reactions-based networks are indeed not weighted and have different structures being generated from different models. The complexity and density of these networks surely require a deeper investigation of the best suitable approach and parameters tuning to discriminate the differences among the samples, which is not the aim of this paper. As mentioned previously, one of the interesting aspects of the metabolic networks is their plasticity since different types of graphs can be generated depending on the desired nodes and connections. In future work, we will consider generating unique tri-partite graph for each patient to investigate the possibility to reduce classification performance differences. As for the networks extracted from tissue-specific models, the Metabolites-based_PDGSMMs networks are weighted by gene expression values. Comparing weighted vs. non-weighted networks in terms of classification performance, it is evident that the weights do not add any crucial information for discriminating the classes (Table 9). These networks derive from PDGSMMs reconstructed through the tINIT algorithm integrating TCGA gene expression data. Adding expression values to edges is therefore redundant and likely the models are already well contextualized. Instead, the weights have a different role in Metabolites-based_tissue networks, www.nature.com/scientificdata www.nature.com/scientificdata/ where are crucial for personalizing the networks in terms of patients. Furthermore, even if tested with different methods, the patients-specific Metabolites-based networks derived from tissue models seem to well contextualize the tissue models in terms of patients resulting as more representative of the tumor classes and with a higher discriminative power, as highlighted by classification performances (Table 7).   www.nature.com/scientificdata www.nature.com/scientificdata/

Usage Notes
The networks presented here have been generated using the Met2Graph R package we developed (see the paragraph on "Code availability"). The model in SBML format is imported and read by the Met2Graph package through the function readSBMLmod from the sybilSBML 46 package. Several checkpoints are included in the function to validate the model object before importing it, such as check of upper and lower bounds, GPR mapping, reactions' ids, and presence of list of reactants and products. The code snippets of Listings 1-4 show Met2Graph functions and arguments used to obtain the different networks: Listing 1 Metabolites-based_tissue networks.

Enzymes-based_ PDGSMMs
There are several open-source network libraries that can be used to analyze and visualize the networks provided in GraphML format. Examples of network analysis and visualization software include NetworkX, igraph, Cytoscape, yEd and Gephi.

code availability
The R package Met2Graph developed and used to generate the TumorMet datasets is publicly available at the Met2Graph Github repository (https://github.com/cds-group/Met2Graph). The package has a detailed tutorial to generate the networks. Met2Graph implements a flexible process flow to build graphs starting from a GSM and can be easily integrated with user-customized functions. It allows the creation of the three different types of graphs described, based on the selection of nodes, edges, and attributes: Metabolites-, Enzymes-and Reactions-based graphs. It allows integrating gene expression data into Metabolites-based graphs. It provides several options and parameters to customize the resulting graphs. To name a few: to create multiple or simplified edges (simplification is possible using three different methods), to remove recurring metabolites, to consider the double direction in case of reversible reactions, to generate graphs as directed or not, and to plot the networks. All the details and the different arguments are described in the package manual and "help" section of the related functions.
The code to compute the distribution based distance measures and to obtain the simplified networks is also available at the GraphDistances Github repository (https://github.com/cds-group/GraphDistances).