Evaluation of single-sample network inference methods for precision oncology

Deschildre, Joke; Vandemoortele, Boris; Loers, Jens Uwe; De Preter, Katleen; Vermeirssen, Vanessa

doi:10.1038/s41540-024-00340-w

Download PDF

Article
Open access
Published: 15 February 2024

Evaluation of single-sample network inference methods for precision oncology

Joke Deschildre^1,2,3^na1,
Boris Vandemoortele^1,2,3^na1,
Jens Uwe Loers^1,2,3,
Katleen De Preter^3,4 &
…
Vanessa Vermeirssen ORCID: orcid.org/0000-0002-1975-0712^1,2,3

npj Systems Biology and Applications volume 10, Article number: 18 (2024) Cite this article

856 Accesses
5 Altmetric
Metrics details

Subjects

Abstract

A major challenge in precision oncology is to detect targetable cancer vulnerabilities in individual patients. Modeling high-throughput omics data in biological networks allows identifying key molecules and processes of tumorigenesis. Traditionally, network inference methods rely on many samples to contain sufficient information for learning, resulting in aggregate networks. However, to implement patient-tailored approaches in precision oncology, we need to interpret omics data at the level of individual patients. Several single-sample network inference methods have been developed that infer biological networks for an individual sample from bulk RNA-seq data. However, only a limited comparison of these methods has been made and many methods rely on ‘normal tissue’ samples as reference, which are not always available. Here, we conducted an evaluation of the single-sample network inference methods SSN, LIONESS, SWEET, iENA, CSN and SSPGI using transcriptomic profiles of lung and brain cancer cell lines from the CCLE database. The methods constructed functional gene networks with distinct network characteristics. Hub gene analyses revealed different degrees of subtype-specificity across methods. Single-sample networks were able to distinguish between tumor subtypes, as exemplified by node strength clustering, enrichment of known subtype-specific driver genes among hubs and differential node strength. We also showed that single-sample networks correlated better to other omics data from the same cell line as compared to aggregate networks. We conclude that single-sample network inference methods can reflect sample-specific biology when ‘normal tissue’ samples are absent and we point out peculiarities of each method.

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data

Article Open access 12 April 2024

PERCEPTION predicts patient response and resistance to treatment using single-cell transcriptomics of their tumors

Article 18 April 2024

Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis

Article Open access 25 March 2024

Introduction

In order to understand the complex molecular interactions at play in tumor pathogenesis, high-throughput omics data have been generated at an increasing pace¹. Modeling these data in biological networks allows for determining the key molecules and processes that drive tumorigenesis^2,3. Traditionally, network inference methods rely on many samples to contribute sufficient information to the learning process and to counteract the curse of dimensionality in the omics data, i.e. the number of genes by far outnumbering the number of samples. Methods to accomplish this on tissue level from bulk omics data are already well-established, and rely on varying underlying statistical and mathematical principles such as correlation, mutual information, Bayesian networks and regression^4,5,6. We and others have shown that different computational methods reveal complementary aspects of the ‘true’ underlying networks^4,7,8,9. However, these methods infer networks based on numerous samples and therefore determine a general estimate of gene interactions largely shared by that group of samples. Hence, they result in population-level networks, averaging the phenotypic effects of individual patients or samples. For clinical applications, we need to be able to interpret and extract meaningful information from omics data of a single individual to be able to direct individualized treatment in precision medicine¹⁰.

Currently, several approaches are being explored to analyze omics data from a single sample or patient, and are referred to in literature as single-sample, single-subject, sample-specific, patient-specific and personalized methodologies. Deep n-of-1 phenotyping, where multiple omics are profiled in a single individual at different locations in the body longitudinally, is envisioned to be essential for the early detection and personalized treatment of cancers¹¹. Obtaining multiple samples from one patient is nonetheless not straightforward due to an increased cost, increased surgical risk, or limited tumor size. Moreover, single-sample or patient-specific networks can be built from single cell RNA-seq data of a single subject, where the profiling of many cells inherently contains the variability required to infer the statistical dependencies between genes¹². However, single cell omics data have specific limitations such as high-dimensionality, sparsity and overdispersion and network inference methods are still being optimized to deal with these issues. Also, single cell technologies are currently more expensive and hardly implemented in the clinic as compared to bulk protocols. On bulk transcriptome data, several methods extract relevant biological knowledge from individual samples without requiring a large disease cohort, as reviewed in¹³. They either provide a gene-centric view on differentially expressed (DE) genes or a pathway-centric view on deregulated pathways, comparing a single sample against a reference cohort or a control sample^13,14,15,16. In addition, VIPER can predict protein activity from regulon enrichment on single-sample gene expression signatures obtained using a reference set¹⁷. The single-sample Network Perturbation Assessment (ssNPA) is a method for subtyping samples based on single-sample deregulation of their gene networks¹⁸. While these methods allow for biological interpretation of omics data at the individual level, they do not generate biological networks or gene interactions for single samples or patients.

To address this, single-sample network inference methods have been developed that can infer a biological network for a single sample from bulk RNA-seq data. Several of these methods make use of an aggregate network constructed from all samples and a statistical wrapper to infer single-sample features within these networks. Others devise a specific statistic to directly obtain single-sample networks. Optionally, networks can be pruned by a background network, such as a protein-protein interaction network. The Single-Sample Network (SSN) algorithm calculates the significant differential network between the Pearson Correlation Coefficient (PCC) networks of a set of reference samples on the one hand and that same reference set plus the sample of interest on the other hand, both using the STRING database as background network¹⁹. The authors experimentally validated that SSN identified functional driver genes contributing to resistance in non-small cell lung cancer cell lines. Subsequently, SSN has been applied to breast and colon cancer to study stage- and subtype-related networks and to identify diagnostic and prognostic biomarkers^20,21. LIONESS also uses a leave-one-out approach in aggregate network inference to come to a single-sample network, and through linear interpolation incorporates information on both the similarities and the differences between the networks with and without the sample of interest²². LIONESS has the major advantage that any network inference method of choice can be used to construct the aggregate networks, and has been applied e.g. to study sex-linked differences in colon cancer drug metabolism²³. The Individual-specific Edge-Network Analysis (iENA) algorithm constructs single-sample PCC node-networks and single-sample higher-order PCC edge-networks by altered PCC calculations of the expression data of the sample of interest and a set of reference samples²⁴. On the other hand, Sample Specific Perturbation of Gene Interactions (SSPGI) computes individual edge-perturbations based on differences between the rank of genes within the expression matrix of normal samples and individual samples of interest²⁵. The Cell-Specific Network construction (CSN) method transforms the expression data into more stable, statistical gene associations, rendering a binary network output at single cell or single-sample resolution, for single or bulk RNA-seq data respectively²⁶. The recent method SWEET also consists of linear interpolation like LIONESS, but integrates genome-wide sample-to-sample correlations to weigh subpopulation sample sizes that can cause network size bias²⁷. Whereas SSN, SSPGI and CSN only apply a differential approach, LIONESS, iENA and SWEET also take into account commonalities between single-sample and aggregate networks.

The above mentioned single-sample network inference methods have mainly been applied by the research groups that developed them and a systemic neutral comparison is still missing. Only limited comparisons have been performed, which either focused on a limited number of methods, focused on downstream network control methods or made use of metabolomics data with a limited number of features^22,28,29,30. Furthermore, many of the compared methods rely on ‘normal tissue’ reference samples to contrast the tumor samples, which might not be available for all tumor types or in all precision oncology cases. The CCLE database offers multiple omics, including transcriptomics, on a large panel of comprehensively characterized human cancer cell lines and thus represents an ideal playground to apply and compare single-sample network inference methods^31,32. In this study we constructed single-sample coexpression networks using SSN, LIONESS, SWEET, iENA, CSN and SSPGI for lung cancer and brain cancer samples. We found that each method constructed networks with distinct topologies at the level of edge weight distributions and network characteristics. The node strengths of the different single-sample networks tended to cluster together according to tumor subtype. For both lung and brain samples, we identified the largest part of subtype-specific hubs in SSN, followed by LIONESS and iENA networks. Hubs in these single-sample networks also differed the most from hubs in the aggregate network. However, for all methods, hubs displayed enrichment for subtype-specific IntOGen/COSMIC drivers for NSCLC and glioblastoma, the two largest sample groups in respectively lung and brain samples. Differential node strengths between tumor subtypes were mainly detected in SSN, LIONESS and SSPGI networks. Yet, differentially strong nodes were not enriched for known subtype specific driver genes. In SSN, LIONESS and iENA, we noticed a tendency for lower node strengths for the bigger subtype sample group in both lung and brain samples, while this potential bias was absent in SWEET, CSN and SSPGI. Finally, we showed that single-sample networks correlated better to other omics data from the same cell line as compared to aggregate networks. Single-sample networks from SSN, LIONESS and SWEET resulted in the largest average correlation coefficients, for both lung and brain samples, and for proteomics and copy number variation data. Overall, we conclude that single-sample networks in the absence of ‘normal tissue’ samples were able to reflect sample-specific information better than aggregate networks and that different tools have their peculiarities that should be taken into account.

Results

Subtype-specific gene expression in lung and brain CCLE cell lines

In order to evaluate single-sample network inference methods in the absence of healthy control reference samples, we set out to compare SSN, LIONESS, SWEET, iENA, CSN and SSPGI on gene expression profiles from CCLE lung and brain cancer cell lines³¹. We identified cell lines that closely matched their corresponding tumor tissue with regard to gene expression, and retained 86 lung and 67 brain cancer cell lines (Methods)³³. These are further split into subtypes including 73 non-small cell lung carcinoma (NSCLC), 12 small cell lung carcinoma (SCLC), 1 lung carcinoid, 36 glioblastoma, 9 astrocytoma, 8 glioma, 9 medulloblastoma, 3 meningioma, 3 oligodendroglioma and 2 primitive neuroectodermal tumor (PNET) cell lines (Methods). An initial clustering of lung expression profiles showed that all but one of SCLC samples clustered separately from NSCLC samples (Fig. 1a). We further compared gene expression in both cancer subtypes and identified 1510 up- and 1553 downregulated genes in NSCLC versus SCLC samples (absolute log fold change (abs(LFC)) >= 1, adjusted p-value (padj) <= 0.05) (Fig. 1b). A clustering of brain expression profiles revealed one subcluster containing all but one of medulloblastoma and all PNET samples (Fig. 1c). Due to limited sample sizes for meningioma, oligodendroglioma, PNET and glioma, we choose to perform subsequent differential analyses in brain between glioblastoma and medulloblastoma samples. In total, 1354 and 1043 genes were up- and downregulated in glioblastoma versus medulloblastoma samples (Fig. 1d). In the supplementary information, we extended some analyses to other tumor subtypes (brain) or sub-subtypes (lung) (see further). Hence, we detected substantial transcriptional differences between tumor subtypes for both lung and brain samples.

**Fig. 1: Lung and brain cancer cell lines exhibit extensive transcriptional differences between subtypes.**

Construction of single-sample networks

For both tumor types, we selected highly-variable genes (HVG) for functional gene network construction. First, we inferred an aggregate, undirected coexpression network using PCC, representing all samples. Next, single-sample networks were inferred using LIONESS, SSN, SWEET, iENA, CSN and SSPGI. We slightly modified several tools to run with PCC as the underlying network inference method and in absence of ‘normal tissue’ reference samples (Methods, GitHub). The choice of PCC as underlying network inference approach allowed for a consistent comparison between single-sample networks, as some methods exclusively function with PCC, and between the single-sample and the aggregate networks. We further pruned the aggregate and single-sample networks by selecting edges present in the HumanNet network, an integrated human functional gene network that was used as background network (Methods, Fig. 2)³⁴. This resulting lung aggregate and single-sample functional gene networks consisted of 5454 nodes and 53 296 edges, covering respectively 30.50% of proteins and 10.14% of HumanNet interactions. Due to lack of scalability, lung SSPGI networks were slightly smaller and comprised 4814 nodes connected by 43 193 edges. Due to gene ID conversion based on a more recent genome annotation, lung SWEET networks were slightly larger comprising 5706 nodes connected by 55 806 edges (Methods, Supplementary Table 1). The aggregate brain network and single-sample brain networks constructed using SSN, LIONESS, CSN and iENA comprised 4741 nodes and 42 948 edges after pruning for HumanNet interactions, while 4686 nodes and 42 206 edges remained in the SSPGI networks. Again, SWEET networks for brain samples were slightly larger comprising 4936 nodes and 45 724 edges.

**Fig. 2: Overview of single-sample network construction and network pruning.**

Different single-sample network inference methods generate distinct network topologies

First, we aimed to explore the network topology of the aggregate and single-sample networks³⁵. Supplementary Fig. 1 depicts the distribution of edge weights in the aggregate networks, as well as across all edges in single-sample networks. Also, it shows the distribution of the average weight of each edge individually across all samples. For lung samples, edge weights in SSN networks ranged between [-0.3, 0.35], while edge weights in SWEET and LIONESS networks ranged between [-1.5, 1.5] and [-25, 30] respectively. iENA lung networks had an edge weight distribution similar to those constructed by LIONESS, with weights ranging between [-25, 32]. CSN produced networks with binary weights of either zero or one, such that all edges present in the network of a specific sample had a weight of exactly one. Finally, networks constructed by SSPGI had edge weights in the interval [-15000, 15000], with non-continuous values since it is a rank-based method. We observed similar edge weight distributions in networks constructed for brain samples. Interestingly, edge weights in SWEET networks followed a distribution highly similar to their respective aggregate network, while for other methods there is a clear deviation. In both tissues, networks constructed by SSN, LIONESS and iENA were predominantly characterized by edge weights close to zero. Due to the binary nature of edge weights in CSN networks, either zero or one, a significant proportion of edges within each single-sample CSN network was associated with weight zero and thus absent in the network. On average, these networks contained 27 814 and 24 399 edges for lung and brain samples, respectively. We therefore selected the top 25 000 edges in SSN, LIONESS, SWEET, iENA and SSPGI networks, rendering networks comparable in size across all methods. These networks, which had previously already been pruned by HumanNet, are further referred to as top 25k networks. The signs of edge weights in these networks were mostly consistent between methods for SSN, LIONESS and iENA. SSPGI and SWEET edges showed some inconsistencies with other methods, while CSN edge weights are binary and thus incomparable (Supplementary Fig. 2).

The top 25k networks varied in edge weight distributions as well as network topology (Supplementary Tables 1–3)³⁶. The aggregate networks had an order of magnitude more connected components than single-sample networks, and also displayed higher clustering coefficients and lower node and edge betweenness. Thus, aggregate networks were more tightly connected with shorter paths between nodes. Overall, topological differences between single-sample networks themselves were rather small, with SPPGI having a lower clustering coefficient than the rest for both tumor types. Thus, although constructed on the same data as the aggregate networks and subjected to similar edge selection procedures, each single-sample network inference method built distinct single-sample networks that were mostly different from the aggregate network, both at the level of edge weight distribution and network topology.

Exploration of single-sample networks

Next, we inquired to what extent LIONESS, SSN, SWEET, iENA, CSN and SSPGI provide relevant biological insights at the sample-specific and subtype-specific level. Therefore, we first calculated the node strengths³⁷ i.e. the sum of absolute edge weights for each node in the single-sample networks, and projected these onto their first two principal components using Principal Component Analysis (PCA). For both lung and brain top 25k networks, the different single-sample networks tended to cluster together according to subtype, but only up to 18% for lung and up to 28% for brain of the total variance is being explained by the first two PCs, with decreasing values going from iENA, SSN, LIONESS, CSN, SSPGI to SWEET (Fig. 3, Supplementary Fig. 2, Supplementary Fig. 3).

**Fig. 3: Visualization of lung samples after projecting the node strengths i.e. the sum of absolute edge weights of the single-sample networks onto their first two principal components.**

Analysis of hubs in single-sample networks

We further identified hubs by selecting the top 200 most connected nodes in each single-sample network and aggregate networks (Methods). As single-sample network inference methods are designed to capture heterogeneity between samples of a tumor type, we expect to some extent different hubs in different samples, and ideally these hub genes are related to the cancer subtype of a given sample. To test this, we first assessed the recurrence of hub genes (i.e. the number of times a given gene is identified as hub across a group of samples) in networks constructed using a given method for lung (Fig. 4a) and brain samples (Fig. 4b). All methods constructed single-sample networks with the majority of hubs being unique to only one or a few samples, for both brain and lung. However, SWEET, CSN and SSPGI networks showed hubs recurring in all samples: respectively 110, 96 and 18 hub genes overlapped between all lung samples and 131, 76 and 21 hub genes overlapped between all brain samples. Some hub genes were regularly recurring within SSN, LIONESS and iENA networks, but none overlapped across all samples. Furthermore, the top 200 hub genes of the aggregate networks of lung and brain were consistently recurring among the hub gene sets identified in single-sample networks, and this was most obvious in CSN and SWEET networks (Fig. 4a, b). Together, these observations suggest that SSN, LIONESS and iENA produced networks that were inherently more different from each other than SWEET, CSN or SSPGI networks. Also, hubs in the aggregate networks tended to be hubs in the single-sample networks.

**Fig. 4: Hub recurrence in single-sample networks.**

Hub genes should ideally be related to the cancer subtype of a given sample, and thus similar hubs should be found within sample groups. We thus grouped all NSCLC, SCLC, glioblastoma and medulloblastoma single-sample networks and evaluated the union and intersection of hub gene sets within these groups (Supplementary Tables 4 and 5). CSN and SWEET represented with the lowest number for the union of hub gene sets within sample groups, indicating a poorer hub diversity over single-sample networks within a tumor-specific or tumor subtype-specific sample group. Especially for SSN, iENA and LIONESS there is limited overlap in hubs e.g. for NSCLC, the largest sample group, there were zero hubs in common across all samples. In CSN and SWEET networks on the other hand, close to or more than 100 hubs were overlapping in any given sample group, indicating a highly similar network topology. Nonetheless, all single-sample networks did have regularly recurring hubs per cancer subtype group (occurring in at least 75% of the samples in a given group). We next identified subtype-specific recurring hubs as those hubs that regularly recur within one sample group, and do not overlap with regularly recurring hub genes in other sample groups (Fig. 4c–f). On Fig. 4, each dot represent one gene that was identified as a hub, and the y-axis represents the number of times that given hub is found across a given sample group. The highest proportion of subtype-specific versus non-subtype-specific hubs among highly-recurring hubs was observed for SSN networks, followed by iENA and LIONESS. Moreover, these methods generated a lower amount of highly recurring hubs. SWEET, CSN and SSPGI had more recurring hub genes that were less specific to the cancer subtype of a given sample group (Fig. 4c–f). Similar results were obtained upon investigating more subtypes (brain) or sub-subtypes (lung) (Supplementary Fig. 5).

Next, we assessed whether these hub lists were enriched for known cancer driver genes. We downloaded a list of known drivers from IntOGen and COSMIC/Cancer Gene Census for NSCLC, SCLC, glioblastoma and medulloblastoma, and additional cancer subtypes (analyses in supplementary information), as well as CCLE cell line specific cancer drivers from the Cell Model Passports database, and assessed their presence in the sets of hubs per sample group. For both databases, we observed in addition to some overlap, many tumor subtype-specific and sub-subtype-specific cancer driver genes (Supplementary Fig. 6). Out of 69 and 59 driver genes for NSCLC and SCLC from IntOGen/COSMIC respectively, 23 were present in the aggregate HumanNet lung network. On the other hand, NSCLC and SCLC samples were characterized by 213 driver genes in total according to Cell Model Passports, of which 74 were present in the aggregate HumanNet lung network. Known IntOGen/COSMIC drivers for medulloblastoma and glioblastoma respectively comprised 57 and 45 genes, of which 9 and 16 respectively were present in the aggregate HumanNet brain network. These samples were further characterized by 72 driver genes according to Cell Model Passports, while 19 of these were present in the aggregate brain network. After concatenating hubs per sample group, we found that each method constructed networks in which hub genes were enriched for subtype-specific IntOGen/COSMIC drivers for NSCLC and glioblastoma, the two largest sample groups. Cell Model Passport drivers on the other hand were enriched in hub genes identified in NSCLC, glioblastoma and medulloblastoma samples (Fig. 5a, b). An analysis of additional sample groups, defined by cancer sub-subtype (lung) or subtype (brain), confirmed that each tool is capable of prioritizing subtype-specific genes as hubs, although with decreasing sample numbers it became more difficult to observe this (Supplementary Fig. 7, Table 1). There was no single tool that clearly outperformed the others.

**Fig. 5: Enrichment of known subtype-specific cancer driver genes in hub gene sets.**

Table 1 Overlap between hubs identified per sample group and known subtype-specific cancer driver genes from IntOGen/COSMIC and Cell Model Passports

Full size table

Overall, we found that SSN, LIONESS, SWEET, iENA, CSN and SSPGI construct single-sample networks, in which different genes were identified as hubs for different samples. For a given sample, the average overlap of hub genes across methods was 25 genes in lung networks, and 42 genes in brain networks. Further, we observed varying degrees of subtype-specificity within hub genes, with CSN and SWEET networks having the lowest diversity among hub genes. Furthermore, we found a significant enrichment of NSCLC and glioblastoma driver genes from both IntOGen/COSMIC and Cell Model Passports for all methods.

Differential node strength in single-sample networks

The node strength quantifies how strongly a node is directly connected to other nodes in the network³⁷ i.e. by summing all absolute weights of edges connected to the given node. In the undirected single-sample networks we calculated the node strength of a given node as the sum of absolute edge weights of that node, after scaling weights to values between -1 and 1. Using linear modeling and an empirical Bayes procedure³⁸, we identified differentially strong nodes (p-adj < 0.05 & |LFC | > 1) between NSCLC and SCLC samples: 59 in LIONESS, 192 in SSN and 363 in SSPGI networks. Only one node was significantly differentially strong in CSN and SWEET lung networks, and zero in iENA networks (Fig. 6). However, none of these gene sets were enriched for NSCLC- and SCLC-specific known driver genes, either from IntOGen/COSMIC nor from Cell Model Passports. For brain networks, we found 113, 116, and 178 differentially strong nodes between glioblastoma and medulloblastoma samples in SSN, LIONESS and SSPGI networks respectively (Supplementary Fig. 8). Again, there was no significant enrichment for known subtype-specific drivers. There was a strong tendency towards negative LFCs for both lung and brain analyses in SSN and LIONESS networks, a phenomenon not observed during DE analysis. This observed preference is likely caused by an unbalanced group size of tumor subtype-specific samples used to construct the aggregate network, i.e., 73 NSCLC versus 12 SCLC samples and 36 glioblastoma versus 9 medulloblastoma samples, resulting in aggregate networks which are more representative of NSCLC and glioblastoma samples respectively. In SSN, LIONESS and iENA, we noticed a tendency for lower node strengths for the bigger subtype group in both lung and brain samples, while this potential bias was absent in SWEET, CSN and SSPGI (Supplementary Figs. 10–19). SWEET aims to minimize subtype group size bias through incorporation of a weighing factor reflecting genome-wide correlations across samples, resulting in similar edge weight distributions across sample groups (Supplementary Fig. 11, 17). However, some bias seemed to remain present, since also in SWEET networks, there was a slight tendency towards negative LFCs (Fig. 6c and Supplementary Fig. 8c).

**Fig. 6: Single-sample networks display distinct differential node strength between non-small cell lung carcinoma (NSCLC) vs small cell lung carcinoma (SCLC) across network types.**

Relating single-sample networks to sample-specific molecular features

Finally, we assessed the biological relevance of single-sample functional gene networks by comparing them to additional CCLE omics measured on the same samples. Ideally, these single-sample networks constructed from transcriptional gene expression profiles have a higher resemblance to other sample-specific omics than the aggregate network has. We downloaded proteomics and copy number variation (CNV) data from CCLE (Methods) and assessed the correlation between node strength i.e. the sum of absolute edge weights of a node and protein abundance, and node strength and CNV (Fig. 7a–d). For the aggregate networks, we assessed correlations between node strength in the aggregate HumanNet network and proteomics/CNV measurements in each individual sample. On average, node strength in the aggregate network did not correlate well with protein abundance or CNV data, displaying correlation coefficients <0.1 for proteomics data and <0.05 for CNV data. On the other hand, for all methods, single-sample networks significantly outperformed the aggregate network for correlation of node strength to both proteomics and CNV data. Only for brain single-sample networks constructed by CSN we detected no significant difference with the aggregate network in the average correlation coefficient between node strength and protein abundance. Overall, single-sample networks from SSN, LIONESS and SWEET resulted in the largest average correlation coefficients, for both lung and brain samples, and for proteomics and copy number variation data. Together, these findings suggest that single-sample network inference methods were better in capturing sample-specific molecular features than aggregate networks and that SSN, LIONESS and SWEET single-sample networks correlated similarly and higher with sample-specific omics than the other methods, and the aggregate network.

**Fig. 7: Feature-wise correlation between node strength in single-sample networks and other sample-specific omics data.**

Discussion

The fight against highly complex and heterogeneous diseases such as cancer necessitates an in-depth understanding of disease pathobiology, at population level, but especially at the level of individual patients. Investigation of biological networks and their rewiring in disease can therefore greatly benefit the development of individualized therapeutic strategies. Although single cell technologies offer the ability of constructing networks for individual patients, there are still limitations associated with this approach, especially in the clinical setting. Bulk molecular profiling techniques on the other hand are well established and cheaper, there is a plethora of data already available, and bulk network inference algorithms have been extensively benchmarked¹². However, bulk network inference methods construct population-level networks, representing interactions shared by most patients. Single-sample network inference methods have thus been developed to prioritize biologically meaningful information of a single individual from bulk omics data, bridging the gap towards personalized medicine, a major goal in present-day cancer research¹⁰.

In this study, we compared six single-sample network inference algorithms, LIONESS, SSN, SWEET, iENA, CSN and SSPGI in their construction of single-sample functional gene networks from tumor cell line transcriptomics data in the absence of normal samples. Specifically for these single-sample networks, we investigated graph properties, sample-specificity and cancer driver properties of hubs, the ability to distinguish samples of different cancer subtypes from each other, as well as their concordance with other sample-specific omics data. Although each method functions as intended in its original publication and research context, these studies understandably lack neutrality, and so far, only limited benchmarking has been performed^22,28,29,30. First, we discovered that each method had different characteristics and requirements concerning in- and output data structures. The SSPGI algorithm was not scalable above 7800 genes. Also, as CSN returns binary networks comprising either zero or one as edge weights, the average number of edges within each single-sample network was lower compared to other methods. Therefore, we first pruned networks by selecting for edges present in the HumanNet reference network, and then selected the top 25 000 edges within each single-sample network to compare networks of similar size. Furthermore, edge weights ranged between highly different outer bounds in networks from very low in SSN to very high in SSPGI, so after exploring edge weight distributions we scaled them to values between [-1,1] for all methods.

Existing benchmarks focused on a limited number of features per sample, on the performance of further downstream structural control (SSC) methods, or only evaluated a limited number of network inference tools^28,29,30. A comparison between LIONESS and SSN revealed that when both methods depend on the exact same aggregate network, there is an almost perfect linear relationship between edge weights of LIONESS and SSN networks for a given sample³⁰. Both methods heavily rely on PCC for network construction, and construct highly similar networks. Indeed, we found that SSN and LIONESS networks had similar network topological characteristics, hub gene sets and correlation to sample-specific omics. However, it must be noted that the mathematical framework employed by LIONESS allows to make use of more advanced network inference tools than PCC²².

Hub genes in SSN networks, identified based on node degree, have been shown to be strongly related to cancer driver mutations¹⁹. However, there is no consensus regarding the number of nodes to select as hubs. While the SSN study suggests to use the top 5, 10 or 20 most connected nodes, we selected the top 200 most connected nodes and found hub gene sets to be significantly enriched for known subtype-specific driver genes. Within methods, there was a variable number of overlapping hubs between different single-sample networks and the aggregate network, with hub genes identified in CSN and SWEET networks displaying the lowest diversity across samples. As a result, hubs from these networks also had the lowest cancer subtype specificity, as most hubs regularly recurred across both subtypes. Since all single-sample networks were undirected coexpression networks, we calculated node strength as the sum of absolute edge weights. We identified most differentially strong nodes in single-sample networks of SSN, LIONESS and SSPGI, although these were not enriched for known subtype-specific cancer driver genes.

One critical remark is that the original applications of LIONESS, SSN, iENA and SSPGI used a group of healthy samples to create the aggregate network or build the edge perturbation matrix, which was not the case in this study. Paired healthy and disease samples are not always available in a clinical setting and not for all tumor types, thus we aimed to investigate the performance of these methods in the absence of control reference samples. When the aggregate network is constructed from a healthy or homogenous group of samples, each sample of interest will be compared to this aggregate network representing a healthy state. One can thus argue that the construction of an aggregate network from a heterogenous group of samples will eventually result in less explicit differences between the aggregate and the single-sample networks. Due to the unbalanced tumor subtype sample numbers during the construction of aggregate lung (73 NSCLC versus 12 SCLC samples) and brain (36 glioblastoma versus 9 medulloblastoma samples) networks, the final aggregate networks were dominated by the larger sample group. As a result, we observed a tendency for higher average node strengths for samples belonging to the underrepresented sample group for SSN, LIONESS and iENA. Also, we noticed a strong tendency towards negative LFCs in comparisons of the node strengths between subtype sample groups for LIONESS and SSN networks. Also the higher proportions of subtype-specific hubs in SSN, LIONESS and iENA networks could potentially be attributed to this potential bias. In the recent study of the single-sample network inference method SWEET, the authors also noticed that sample size differences between intrinsic subpopulations may cause a network size bias in the statistical perturbation model for the SSN method, in the statistical dependency model for the CSN method and in the model of removing a single sample from an aggregate network for the LIONESS method²⁷. Adversely, SWEET includes a weighting factor during edge weight calculation that reflects genome-wide sample-to-sample correlations. However, in our study this resulted in highly similar single-sample networks, as we observed high similarity of hub gene sets, low hub subtype-specificity, and only a single differentially strong node in NSCLC vs SCLC or medulloblastoma vs glioblastoma single-sample networks. Nevertheless, SWEET, together with SSN and LIONESS, was one of the methods where node strengths correlated the best with single-sample proteomics and CNV data. It must be noted that whereas SWEET employs a Z-test on the fully connected network to select edges and build the final single-sample networks, we opted to use SWEET with a selection of the top 25 000 edges, as we did for the other methods. SWEET will weigh edges of subtype samples that are overrepresented more, because the difference of these edge weights to the edge weights of the aggregate network becomes less. Upon selecting the edges with the top 25 000 highest weights, SWEET single-sample networks together will therefore consist of more balanced edge weights reflecting all subtypes as opposed to LIONESS that will favor the edge weights of underrepresented subtypes.

Hence, we advocate for the careful assembly of aggregate networks with subgroups of similar size, especially when using SSN, LIONESS or iENA. Ideally, also covariates are taken into account, although this is not possible in the correlation framework employed by SSN, LIONESS, SWEET or iENA, or the frameworks employed by CSN and SSPGI³⁹. The recent single-sample network inference method DysRegNet, which is based on an aggregate network of normal control samples, employs linear models using TF expression as an explanatory variable for target gene expression, which allows to also incorporate known covariates such as sex, age, or origin of the sample³⁹.

Overall, there is a lack of ground truth data which makes a true benchmark study difficult. Instead, we explored the relationship between single-sample networks and other omics data modalities, namely proteomics and CNV, at the sample level. We found that single-sample networks inferred by all methods outperformed the aggregate network regarding correlation to sample-specific omics. Furthermore, we demonstrated this correlation difference with two independent omics data, proteomics and CNV, reinforcing that single-sample networks provide sample-specific information that is not present in the aggregate network, hence their added value. SSN, LIONESS and SWEET showed a higher correlation to sample-specific omics than the other methods. Correlation to gene expression data was even higher, however, gene expression data in itself cannot provide the additional level of systems biological insights as provided by network analysis through e.g. hub gene analysis or differential node strength. Moreover, clusters of nodes in coexpression networks often represent biological entities that function together in the same process⁹. Also, we heavily relied on cancer subtype annotations of CCLE cell lines during our hub gene and differential strength analyses. Yet, clustering of samples based on expression data showed that these annotations might not be ideal, as several samples clustered together with other subtypes. These issues have previously been addressed³³, and we used a specific approach to include the most relevant cell lines in our study.

In conclusion, we have constructed single-sample networks for 86 lung and 69 brain cancer cell lines from CCLE, using six different single-sample network inference methods. Several network pruning steps were required to make networks comparable. For all methods, we found that hub genes were enriched for known cancer subtype-specific driver genes and node strengths of single-sample networks correlated better to sample-specific omics than the traditional bulk aggregate network, suggesting that single-sample networks are a valuable tool for personalized medicine (Fig. 8). Overall, CSN and SWEET performed worse than other methods in hub analyses such as hub specificity and enrichment of known drivers. SWEET single-sample networks seemed to be highly similar to each other regarding hub gene sets and a lack of differentially strong nodes, which suggest the inclusion of a weighting factor during single-sample network inference removes a significant portion of variability. Also CSN networks might suffer from high similarity across samples due to the binary nature of edge weights. Based on correlation of the node strengths of single-sample networks to sample-specific proteomics and CNV data, SSN, LIONESS and SWEET performed best in providing sample-specific information (Fig. 8). For SSN, LIONESS and iENA, it is important to balance different sample groups within the samples under study, since these methods seemed to have a bias for sample group size.

**Fig. 8: Summary of single-sample networks evaluation.**

Hence, from our study, we conclude that most of the single-sample network inference methods are able to reflect sample-specific biology better than aggregate networks for use-cases were ‘normal tissue’ samples are absent. However, single-sample network inference remains a very challenging task since information gain depends on only one datapoint per gene, and we showed that algorithmic choices can have strong influences on the outcome. While these methods represent a valuable resource for personalized medicine and precision oncology, we recommend that any generated hypothesis should be carefully interpreted and experimentally validated.

Methods

Data and cell line selection

Expression read counts and metadata were downloaded from the DepMap (Cancer Dependency Map) website (20Q4 version 2 release)^31,32,40. Expression data were available for 84 primary brain cancer and 189 primary lung cancer cell lines. For both tumor types, outlier cell lines were excluded in two ways. First, only cell lines with Spearman correlation of expression profiles greater than 0.55 to real tumor tissue, as outlined in a pan-cancer comparison of CCLE cell lines and TCGA tumor samples³³, were kept to ensure biological interpretability. Second, cell lines strongly differing from the other cell lines of the same tumor type were removed by clustering to exclude unrepresentative samples for a given tissue. Clustering was done with the hclust function using average agglomeration, with tree cutting at 95% of the maximum height of the tree in R (version 3.6). Clusters with less than 3 samples were removed. After filtering, 67 brain and 86 lung cancer cell lines were retained for single-sample network inference.

Expression data preprocessing

The RNA-seq count data was processed using EdgeR⁴¹ in R (version 3.6). Raw counts were filtered to keep only genes with counts per million (cpm) greater than 1 in at least one sample. Next, raw counts were normalized by library size, converted into cpm and log-transformed. We selected 7942 and 9252 highly variable genes (HVG; variance >2.75 over all samples per tumor type), respectively, for brain and lung as input for single-sample network inference (HVG selection). Due to lack of scalability, only 7800 highly variable genes were retained for SSPGI for both tumor types (see SSPGI method section). Finally, scaling and centering were performed per gene. Heatmaps were constructed after calculating Spearman correlations between samples and applying Ward linkage using ComplexHeatmap⁴².

Aggregate networks and network visualization

Aggregate networks were constructed separately for brain and lung samples using PCC after HVG selection. These fully connected coexpression networks were subjected to pruning for edges in the HumanNet background network, an integrated functional gene network³⁴. We choose to work with the HumanNet-XN v2 network (https://www.inetbio.org/humannetv2/), which contains 17 929 genes and 525 537 edges representing physical protein-protein interactions, functional associations substantiated by different omics data and interologs from other species and co-citation links. We then selected the top 25 000 edges based on edge weights in these networks.

Cancer driver genes

Lists of known cancer drivers i.e. genes which contain mutations that have been causally implicated in cancer, for different lung and brain cancer subtypes were downloaded from the IntOGen website (19/06/2023, https://www.intogen.org/search) and the COSMIC/Cancer Gene Census website (19/06/2023, https://cancer.sanger.ac.uk/cosmic/census). Also, for each CCLE cell line included in this study, specific known cancer drivers were downloaded from Cell Model Passports (19/10/23, https://cellmodelpassports.sanger.ac.uk/). These cancer drivers were then grouped per sample group and considered subtype-specific drivers.

Single-sample network inference methods

Table 2 provides an overview of the different single-sample network inference methods used in this study. We slightly modified several methods to run them with PCC as underlying network inference method, without ‘normal tissue’ reference samples, as well as with different or without background networks.

Table 2 Overview of the different single-sample network inference methods used in this study

Full size table

In SSN (Single-Sample Network), a reference PCC network is first generated based on transcriptome data of several reference samples, usually normal tissue samples. Here we used all selected cell lines for a specific tumor type, with the sample of interest each time omitted as a reference. Then, the same is done for all the reference samples plus the sample of interest to generate the so-called perturbed network. Finally, these two networks are subtracted from each other. The significance of p-values is not considered as we prune all the networks in the same way using HumanNet and 25k HumanNet (Fig. 2)¹⁹. Although there is a SSN Python implementation online, we made our own R implementation for ease of use with the above-mentioned modifications.

In LIONESS, linear interpolation is performed on the edge weights of two networks, here constructed by PCC, the first containing all samples, the second containing all samples except for the sample of interest²². We used the LIONESS function from the LIONESS R package (https://github.com/mararie/LIONESSR). This function creates one edge list file for all samples in the input expression dataset. We used the single-sample PCC calculation in the iENA node-network²⁴, where the PCC between two genes in a single-sample is calculated using mean and variance in a reference group, usually normal tissue samples, but here all selected cell lines for a specific tumor type. A customized implementation of the algorithm in R was made because no source code was provided with the original publication.

The Cell-Specific Network (CSN) was developed originally to infer gene association networks for single cells. Still, it can also be applied to bulk data to infer single-sample networks²⁶. It generates a binary output, where gene-gene interactions are considered present (1) or not present (0). CSN is based on statistical dependency. For each gene pair, an expression scatter diagram is made in which each dot represents one cell or sample. Next, within each plot, the neighborhood of each sample is identified using a predefined rectangular distance threshold along both axes. The number of neighboring cells or samples in these neighborhoods (n_x, n_y and n_xy) divided by the total number of cells or samples n are estimates of the marginal density function of x and y and the joint density function of x and y, respectively. These are used to define a statistic from -1 to 1, which follows a normal distribution given that gene x and gene y are independent. Because the mean and the standard deviation of this normal distribution are known, this statistic can be used to calculate a p-value that, in case of significance, rejects the null hypothesis that gene x and y are independent in sample k and form an edge. The MATLAB code for this method is provided on the papers GitHub page (https://github.com/wys8c764/CSN).

SSPGI calculates edge perturbation values²⁵. First, the gene expression matrix is converted into a rank matrix by ranking the genes according to the expression value in each sample. Second, a delta rank matrix is calculated by subtracting the ranks of any two genes connected by a given edge in the background network. The original publication created a background network based on all gene interactions in the Reactome pathway database⁴³. In theory, the required background network could contain all possible edges between all genes in all samples, but in practice this is not feasible due to the lack of scalability of SSPGI. We could run SSPGI only on 7800 genes and with HumanNet given as background network, which caused no other interactions being calculated than the ones present int HumanNet. Including more genes, or all possible edges as background caused the method to terminate with an error. For all methods we worked on a high performance computing infrastructure on an Dual Intel Xeon Gold 6420 CPU cluster using one node with a usable memory of 700 GBM and 2×18 cores. As within-sample delta ranks of gene pairs are stable under normal conditions, a benchmark delta rank vector is calculated using the mean rank of all genes across the reference group of normal samples. However, we built this benchmark delta rank vector using all selected cell lines for a specific tumor type. Finally, the edge perturbation matrix is created by subtracting every sample’s benchmark delta rank vector from the delta rank matrix. The authors provide the SSPGI implementation written in R on their GitHub page (https://github.com/Marscolono/SSPGI).

SWEET constructs a single-sample network for each sample S_p based on the gene expression of n case samples. First, SWEET calculates a genome-wide sample-to-sample correlation matrix. The average PCC for each sample S is then used to calculate a genome-wide sample weight W_S. Second, an aggregate network is constructed using PCCs as edge weights. A perturbed network is then inferred by creating a copy of the expression profile of sample S, and calculating PCCs between all genes for n + 1 samples. Finally, the difference between the aggregate and perturbed network is calculated and integrated with genome wide sample weights W to construct n single-sample networks. The significance level of each edge is later evaluated using a z-test, and all edges with a score larger than the significance level are removed. However, in our implementation we omitted this last step, and selected the top 25 000 edges in each single-sample network instead, so that networks constructed by different methods were comparable in size. Due to a more recent genome annotation used to convert Entrez IDs in the HumanNet network to gene symbols, SWEET networks were slightly larger than SSN, LIONESS, iENA, CSN and SSPGI before selection of the top 25 000 edges.

Outputs from each single-sample network inference method were converted into a dataframe with edges as rows and samples as columns, generating a uniform format as provided by the LIONESS algorithm. In a subsequent step, we filtered the edges of the other single-sample networks obtained by SSN, LIONESS, SWEET, iENA and CSN based on the HumanNet background network, as explained for the aggregate network. Finally, we selected the top 25 000 edges in each single-sample network constructed by SNN, LIONESS, SWEET, iENA and SSPGI, in order to make them comparable in size to the average CSN network.

Analysis of network topology

The average edge weight distribution was plotted for each method using ggplot2’s geom_density function in R⁴⁴. The average weight of an edge was determined by calculating the average weight of a given edge across all samples, ignoring entries for which the given edge was not present in the single-sample network. For plotting the weight density distribution of all edges, the weights of all non-zero edges were concatenated into a single weight vector. We used the igraph package in R to determine network characteristics⁴⁵. Clustering coefficients were calculated using the transitivity function with type average, network densities using the edge_density function without considering loops. Node betweenness was calculated using the estimate_betweenness function while treating the network as an undirected graph. As this is a node-specific characteristic, we calculated the mean value across all nodes within each sample. Edge betweenness was determined using the betweenness function with the same parameters as for node betweenness. Finally, the diameter and count_components functions were used to calculate network diameter and the number of connected components, respectively.

Principal component analysis

The node strength was calculated as the sum of absolute edge weights for each node³⁷. Node strength matrices were transposed so rows represented samples and columns represented nodes, after which R’s prcomp function was applied. Plots were drawn using the autoplot function in ggplot2, and dots were colored according to the sample cancer subtype.

Hub gene analysis

Hub genes were identified as the top 200 most connected nodes in each top 25k single-sample network. Enrichment for known cancer driver genes was assessed under a hypergeometric distribution using all genes present in the network as background. Violin plots were made visualizing the recurrence of hubs in all samples as well as all samples within a disease subtype. Regularly recurring hubs were defined as hubs recurring in at least 75% of networks in a sample group per method.

Differential node strength

Since LIONESS, SSN, iENA, CSN, SSPGI and SWEET produce single-sample networks with different ranges of edge weights, we performed a within-sample normalization to scale edge weights within [-1, 1]. The differential node strength between sample subgroups was evaluated by calculating the sum of absolute edge weights for each node and applying linear modeling with an Empirical Bayes procedure, as implemented in the limma package⁴⁶. The differential strong nodes were identified having an absolute log-fold change (LFC) > 1 and an adjusted p-value < 0.05 (Benjamini & Hochberg correction). Enrichment for known cancer driver genes was assessed by a hypergeometric distribution using a combined list of all known driver genes per tissue and all genes present in the network as background.

Comparison to other omics data

Normalized proteomics datasets were downloaded from the CCLE website (protein_quant_current_normalized.csv, version 20Q4), and cell lines were matched to samples present in single-sample networks for lung and brain, separately. Rows with duplicate gene symbols were removed. As these data were already normalized, no further preprocessing was applied. Next, we selected samples with matching proteomics data from the preprocessed expression dataset. The node strength i.e. the sum of absolute edge weights per node was calculated in all single-sample networks and the aggregate networks. We then calculated PCCs between proteomics data and the node strength of all nodes in the single-sample networks for each matching sample, and between the node strengths of the aggregate network and each individual proteomics sample. Copy number variation data was also downloaded from the CCLE website (20Q4_v2_CCLE_gene_cn.csv). We applied the same procedure to copy number variation data. Results were plotted using ggplot2⁴⁴.

Funkyheatmap

The funky heatmap was plotted using funkyheatmap package in R⁴⁷.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

No new data was generated for this study.

Code availability

All scripts used to construct and analyze the aggregate and single-sample functional gene networks in this study can be found at https://github.com/CBIGR/single_sample_networks.

References

Singer, J. et al. Bioinformatics for precision oncology. Brief. Bioinform. 20, 778–788 (2019).
Article CAS PubMed Google Scholar
Erbe, R., Gore, J., Gemmill, K., Gaykalova, D. A. & Fertig, E. J. The use of machine learning to discover regulatory networks controlling biological systems. Mol. Cell 82, S109727652101073X (2022).
Article Google Scholar
Ozturk, K., Dow, M., Carlin, D. E., Bejar, R. & Carter, H. The emerging potential for network analysis to inform precision. Cancer Med. J. Mol. Biol. 430, 2875–2899 (2018).
Article CAS Google Scholar
The DREAM5 Consortium. et al. Wisdom of crowds for robust gene network inference. Nat. Methods 9, 796–804 (2012).
Article PubMed Central Google Scholar
Mercatelli, D., Scalambra, L., Triboli, L., Ray, F. & Giorgi, F. M. Gene regulatory network inference resources: a practical overview. Biochim. Biophys. Acta BBA-Gene Regul. Mech. 1863, 194430 (2020).
CAS Google Scholar
Delgado, F. M. & Gómez-Vela, F. Computational methods for Gene Regulatory Networks reconstruction and analysis: A review. Artif. Intell. Med. 95, 133–145 (2019).
Article PubMed Google Scholar
Vermeirssen, V. et al. Transcription regulatory networks in Caenorhabditis elegans inferred through reverse-engineering of gene expression profiles constitute biological hypotheses for metazoan development. Mol. Biosyst. 5, 1817–1830 (2009).
Article CAS PubMed Google Scholar
Vermeirssen, V., De Clercq, I., Van Parys, T., Van Breusegem, F. & Van de Peer, Y. Arabidopsis ensemble reverse-engineered gene regulatory network discloses interconnected transcription factors in oxidative stress. Plant Cell 26, 4656–4679 (2014).
Article CAS PubMed PubMed Central Google Scholar
Loers, J. U. & Vermeirssen, V. SUBATOMIC: a SUbgraph BAsed mulTi-OMIcs clustering framework to analyze integrated multi-edge networks. BMC Bioinforma. 23, 363 (2022).
Article Google Scholar
van der Wijst, M. G. P., de Vries, D. H., Brugge, H., Westra, H.-J. & Franke, L. An integrative approach for building personalized gene regulatory networks for precision medicine. Genome Med. 10, 96 (2018).
Article PubMed PubMed Central Google Scholar
Yurkovich, J. T., Tian, Q., Price, N. D. & Hood, L. A systems approach to clinical oncology uses deep phenotyping to deliver personalized care. Nat. Rev. Clin. Oncol. 17, 183–194 (2020).
Article PubMed Google Scholar
Nguyen, H., Tran, D., Tran, B., Pehlivan, B. & Nguyen, T. A comprehensive survey of regulatory network inference methods using single cell RNA sequencing data. Brief. Bioinform. 22, bbaa190 (2020).
Article PubMed Central Google Scholar
Vitali, F. et al. Developing a ‘personalome’ for precision medicine: emerging methods that compute interpretable effect sizes from single-subject transcriptomes. Brief. Bioinform. 20, 789–805 (2017).
Article PubMed Central Google Scholar
Gardeux, V. et al. ‘N-of-1- pathways ’ unveils personal deregulated mechanisms from a single pair of RNA-Seq samples: towards precision medicine. J. Am. Med. Inform. Assoc. 21, 1015–1025 (2014).
Article PubMed PubMed Central Google Scholar
Wang, H. et al. Individual-level analysis of differential expression of genes and pathways for personalized medicine. Bioinformatics 31, 62–68 (2015).
Article PubMed Google Scholar
Xie, J. et al. Identification of population-level differentially expressed genes in one-phenotype data. Bioinformatics 36, 4283–4290 (2020).
Article CAS PubMed PubMed Central Google Scholar
Alvarez, M. J. et al. Functional characterization of somatic mutations in cancer using network-based inference of protein activity. Nat. Genet. 48, 838–847 (2016).
Article CAS PubMed PubMed Central Google Scholar
Buschur, K. L., Chikina, M. & Benos, P. V. Causal network perturbations for instance-specific analysis of single cell and disease samples. Bioinformatics 36, 2515–2521 (2020).
Article CAS PubMed Google Scholar
Liu, X., Wang, Y., Ji, H., Aihara, K. & Chen, L. Personalized characterization of diseases using sample-specific networks. Nucleic Acids Res. 44, e164–e164 (2016).
Article PubMed PubMed Central Google Scholar
Zhu, K., Pian, C., Xiang, Q., Liu, X. & Chen, Y. Personalized analysis of breast cancer using sample-specific networks. PeerJ 8, e9161 (2020).
Article PubMed PubMed Central Google Scholar
Hu, F., Wang, Q., Yang, Z., Zhang, Z. & Liu, X. Network-based identification of biomarkers for colon adenocarcinoma. BMC Cancer 20, 668 (2020).
Article CAS PubMed PubMed Central Google Scholar
Kuijjer, M. L., Tung, M. G., Yuan, G., Quackenbush, J. & Glass, K. Estimating Sample-Specific Regulatory Networks. iScience 14, 226–240 (2019).
Article CAS PubMed PubMed Central ADS Google Scholar
Lopes-Ramos, C. M. et al. Gene Regulatory Network Analysis Identifies Sex-Linked Differences in Colon Cancer Drug Metabolism. Cancer Res. 78, 5538–5547 (2018).
Article CAS PubMed PubMed Central Google Scholar
Yu, X. et al. Individual-specific edge-network analysis for disease prediction. Nucleic Acids Res. 45, e170–e170 (2017).
Article PubMed PubMed Central Google Scholar
Chen, Y., Gu, Y., Hu, Z. & Sun, X. Sample-specific perturbation of gene interactions identifies breast cancer subtypes. Brief. Bioinform. 22, bbaa268 (2021).
Article PubMed Google Scholar
Dai, H., Li, L., Zeng, T. & Chen, L. Cell-specific network constructed by single-cell RNA sequencing data. Nucleic Acids Res. 47, e62–e62 (2019).
Article CAS PubMed PubMed Central Google Scholar
Chen, H.-H. et al. SWEET: a single-sample network inference method for deciphering individual features in disease. Brief. Bioinform. 24, bbad032 (2023).
Article PubMed PubMed Central Google Scholar
Guo, W.-F. et al. A novel network control model for identifying personalized driver genes in cancer. PLOS Comput. Biol. 15, e1007520 (2019).
Article PubMed PubMed Central Google Scholar
Guo, W.-F. et al. Performance assessment of sample-specific network control methods for bulk and single-cell biological data analysis. PLOS Comput. Biol. 17, e1008962 (2021).
Article CAS PubMed PubMed Central Google Scholar
Jahagirdar, S. & Saccenti, E. Evaluation of Single Sample Network Inference Methods for Metabolomics-Based Systems. Med. J. Proteome Res. 20, 932–949 (2021).
Article CAS Google Scholar
Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modeling of anticancer drug sensitivity. Nature 483, 603–607 (2012).
Article CAS PubMed PubMed Central ADS Google Scholar
Ghandi, M. et al. Next-generation characterization of the Cancer Cell Line Encyclopedia. Nature 569, 503–508 (2019).
Article CAS PubMed PubMed Central ADS Google Scholar
Yu, K. et al. Comprehensive transcriptomic analysis of cell lines as models of primary tumors across 22 tumor types. Nat. Commun. 10, 3574 (2019).
Article CAS PubMed PubMed Central ADS Google Scholar
Hwang, S. et al. HumanNet v2: human gene networks for disease research. Nucleic Acids Res. 47, D573–D580 (2019).
Article CAS PubMed ADS Google Scholar
Kuijjer, M. L. & Glass, K. Reconstructing Sample-Specific Networks using LIONESS. https://doi.org/10.1101/2021.09.27.461954 (2021)
Davis, J. D. & Voit, E. O. Metrics for regulated biochemical pathway systems. Bioinformatics 35, 2118–2124 (2019).
Article CAS PubMed Google Scholar
Wang, M., Wang, H. & Zheng, H. A Mini Review of Node Centrality Metrics in Biological Networks. Int. J. Netw. Dyn. Intell. 99–110 https://doi.org/10.53941/ijndi0101009 (2022)
Lopes-Ramos, C. M. et al. Sex Differences in Gene Expression and Regulatory Networks across 29 Human Tissues. Cell Rep. 31, 107795 (2020).
Article CAS PubMed PubMed Central Google Scholar
Lazareva, O. et al. DysRegNet: Patient-specific and confounder-aware dysregulated network inference. bioRxiv 2022–04 (2022).
Dempster, J. M. et al. Agreement between two large pan-cancer CRISPR-Cas9 gene dependency data sets. Nat. Commun. 10, 5817 (2019).
Article CAS PubMed PubMed Central ADS Google Scholar
Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).
Article CAS PubMed Google Scholar
Gu, Z., Eils, R. & Schlesner, M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics 32, 2847–2849 (2016).
Article CAS PubMed Google Scholar
Gillespie, M. et al. The reactome pathway knowledgebase 2022. Nucleic Acids Res. 50, D687–D692 (2022).
Article CAS PubMed Google Scholar
Wickham, H. ggplot2: elegant graphics for data analysis Springer-Verlag New York; 2009. Prepr. At (2016).
Csardi, G. & Nepusz, T. The igraph software package for complex network research. Inter. J. Complex Syst. 1695, 1–9 (2006).
Google Scholar
Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).
Article PubMed PubMed Central Google Scholar
Saelens, W., Cannoodt, R., Todorov, H. & Saeys, Y. A comparison of single-cell trajectory inference methods. Nat. Biotechnol. 37, 547–554 (2019).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

This study was funded by an FWO PhD fellowship fundamental research [11N5922N] for JD, a BOF PhD fellowship [BOF20/DOC/285] for JL and a BOF Starting Grant [BOF/STA/201909/030]. The funder played no role in study design, data collection, analysis and interpretation of data, or the writing of this manuscript.

Author information

These authors contributed equally: Joke Deschildre, Boris Vandemoortele.

Authors and Affiliations

Lab for Computational Biology, Integromics and Gene Regulation (CBIGR), Cancer Research Institute Ghent (CRIG), Ghent, Belgium
Joke Deschildre, Boris Vandemoortele, Jens Uwe Loers & Vanessa Vermeirssen
Department of Biomedical Molecular Biology, Ghent University, Ghent, Belgium
Joke Deschildre, Boris Vandemoortele, Jens Uwe Loers & Vanessa Vermeirssen
Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
Joke Deschildre, Boris Vandemoortele, Jens Uwe Loers, Katleen De Preter & Vanessa Vermeirssen
Lab of Translational Onco-genomics and Bio-informatics, Center for Medical Biotechnology (VIB-UGent), Cancer Research Institute Ghent (CRIG), Ghent, Belgium
Katleen De Preter

Authors

Joke Deschildre
View author publications
You can also search for this author in PubMed Google Scholar
Boris Vandemoortele
View author publications
You can also search for this author in PubMed Google Scholar
Jens Uwe Loers
View author publications
You can also search for this author in PubMed Google Scholar
Katleen De Preter
View author publications
You can also search for this author in PubMed Google Scholar
Vanessa Vermeirssen
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.D. and B.V. contributed equally and should be considered shared first authors. J.D. implemented the single sample network inference methods; J.D., B.V. and J.L. conducted the data analysis and wrote programming code; J.D. and B.V. made all figures and tables; J.D., B.V. and V.V. wrote the initial draft; J.D., B.V., J.L., K.D.P. and V.V. edited the final manuscript; V.V. conceptualized, designed and supervised the study; J.D., B.V., J.L. and K.D.P. contributed to the conceptualization of the study. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Vanessa Vermeirssen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplemental material

Reporting summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Deschildre, J., Vandemoortele, B., Loers, J.U. et al. Evaluation of single-sample network inference methods for precision oncology. npj Syst Biol Appl 10, 18 (2024). https://doi.org/10.1038/s41540-024-00340-w

Download citation

Received: 11 July 2023
Accepted: 17 January 2024
Published: 15 February 2024
DOI: https://doi.org/10.1038/s41540-024-00340-w

Subjects

Abstract

Similar content being viewed by others

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data

PERCEPTION predicts patient response and resistance to treatment using single-cell transcriptomics of their tumors

Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis

Introduction

Results

Subtype-specific gene expression in lung and brain CCLE cell lines

Construction of single-sample networks

Different single-sample network inference methods generate distinct network topologies

Exploration of single-sample networks

Analysis of hubs in single-sample networks

Differential node strength in single-sample networks

Relating single-sample networks to sample-specific molecular features

Discussion

Methods

Data and cell line selection

Expression data preprocessing

Aggregate networks and network visualization

Cancer driver genes

Single-sample network inference methods

Analysis of network topology

Principal component analysis

Hub gene analysis

Differential node strength

Comparison to other omics data

Funkyheatmap

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplemental material

Reporting summary

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links