Introduction

Cancer can be seen as an altered state of the regulatory systems that control cell proliferation and cell death. Such systems are generally not sensitive to individual gene malfunctions, but an aggregation of aberrations can lead to sufficient dysregulation to cause cancer. Reliable models of these regulatory interactions would offer insight into key mechanistic alterations for therapeutic targeting. Cancer subtype-specific gene regulatory networks (GRN) encode intracellular dynamics1, and offer understanding into the functional changes driving disease development. Inference of such models generally exploits certain aspects of the experimental setup, such as pooling among replicates to amplify signal, or makes use of prior knowledge2,3. The experimental techniques, setup, and data collection quality determine the quality of an inferred GRN model. However, practical limitations of experimentation, such as high noise levels and few experiments relative to the vast combinatorial landscape of possible regulatory interactions, often prevent any GRN inference methods from inferring a correct GRN4. Methods using data from known and systematic perturbations have shown greater accuracy among inference techniques since more information is available to determine regulatory causal mechanisms in the system5.

GRN inference has proven its value to unravel novel regulatory links of biological significance. For instance, ARACNe was applied to gene expression profiles to predict a glioma-specific GRN, revealing that C/EBPbeta and STAT3 are master regulators of mesenchymal transformation, which was validated experimentally6. In another study, eight key genes were knocked down by siRNA, and the gene expression together with prior knowledge were used to infer a GRN network in the RAS pathway with good validation performance7.

A large number of GRN inference algorithms exist. In a survey by the DREAM5 project8 it was shown on simulated and E. coli data that some methods performed better than random predictions. However, many methods did not outperform random prediction, and on yeast data no method performed much better than random selection. Since this study, the community has developed methods for integrating various priors: literature/database, ATAC-seq, DNase I hypersensitive sites, ChIP-Seq, or proteomics data to increase information about the system9,10,11. A few trends can be seen from the three GRN benchmarks DREAM58, GeneNetWeaver12 and NetBenchmark13. In these studies, each benchmarking 35, 6, and 10 inference methods, the methods were found to produce AUPR values ranging from 0 to 0.3, with high values being rare. This variability is also seen for individual methods across different benchmark studies. For instance, while Genie3 is the best performing method in DREAM5, it performed relatively poorly in the other two. Such disparities may be caused by differences in the specific conditions under which the benchmarks were run or in parameters of the synthetic data creation such as size, noise, and network properties. While some methods employ relatively simple computational techniques and therefore can scale to thousands of genes, they tend to produce low accuracy in benchmarks. DREAM5 grouped inference methods into the categories Regression, Mutual Information, Correlation, Bayesian networks, Other approaches, and Meta predictors. Overall, no category clearly outperformed all the other ones, and within each category there is a mix of well and poorly performing methods. Even methods such as neural networks, which in other research settings have performed exceedingly well14, performed poorly here.

These benchmarks and surveys testify to the fact that network inference remains a very challenging task considering expression data alone. Integrative approaches can improve performance but depend on the availability of different types of omics data, and face challenges such as varying experimental setups, heterogeneity, and quality of input data. In many cases, only expression data are available however, and here the quality of data is paramount for accurate GRN inference15,16.

In this study, we deployed perturbations through siRNA gene knockdown of each gene in our literature-curated set of cancer-related regulator genes, each followed by transcriptomics measurements of all genes, in order to measure the global influence of each individual gene. Knockdown experiments are more informative about the system than irreversible and complete knockout which may cause drastic rewiring of the underlying network into another system entirely. Assuming a linear time invariant (LTI) system17, once the system has reached a steady-state, a GRN can be inferred by solving a set of first order ordinary differential equations (ODEs)18 in the form of our linear model (Eq. 1). Importantly, our linear model is reliant on a known perturbation design, which adds valuable information to the inference. In this way a selected set of 40 genes relevant to cancer were perturbed, and the transcriptomic response data were used to construct a model of underlying regulatory interactions. We inferred these interactions by relating the effect of the gene perturbations to the expression of the readout genes, using three GRN inference algorithms well suited for employing our linear model and perturbation design: LASSO, LSCO, and TLSCO. The regulatory interactions inferred by these methods are not limited to direct physical interactions, but should be seen as regulatory influences, which may be indirect via genes that are not modeled because they were not measured.

A drawback of all GRN inference algorithms is that they generally produce erroneous GRNs if the noise level is high15,16. To ensure inference of reliable GRNs, we employed NestBoot19, a recent framework implementing nested bootstrapping, wrapped around any individual GRN inference method to better account for sample variation and noise. Contained within the GeneSPIDER package16, NestBoot generates bootstrap support distributions for links inferred from measured as well as shuffled data20, and minimizes false links by comparing them. This way Nestboot is able to discard links even if they have high bootstrap support, if they also have this in the null distribution. NestBoot has been shown to give substantially increased inference accuracy across both synthetic and experimental datasets when compared to the methods in their native implementation.

In order to measure the accuracy of an inferred GRN, a true GRN is required. Because this is generally not available in the case of real data, we here introduce a framework to assess the predictiveness of an inferred GRN in the absence of a true GRN. Note that we are not presenting a GRN inference method on its own but rather a way to assess the quality of a given GRN. We first use it to measure a GRN’s ability to predict the data compared to a distribution of GRNs with the same topology as the inferred one but whose links have been shuffled. We complemented this performance evaluation by measuring the GRN’s ability to predict the data compared to a distribution of shuffled data. Finally, we present the best performing GRN in detail, although the other inferred GRNs are largely subsets of each other and mostly perform well too. Two of the novel links of the best GRN were experimentally validated. The presented GRN captures regulatory interactions central to cancer-relevant processes and we foresee that it can provide mechanistic insights that can help to guide future cancer research. For instance, many cancers are caused by dysregulation of the MYC oncogene, hence our finding of a new regulator of MYC may potentially lead to new therapies.

Methods

Knockdown data collection

A set of genes was assembled from different pathways and complexes, each interacting to some degree with the oncogene MYC21 (Tables S2 and S6). Each readout gene was perturbed in the human squamous carcinoma cell line A431 via transfection with short interfering RNAs (siRNAs). We then harvested, purified and prepared libraries using the Ambion Library Construction Kit22. A precise record of perturbations is key to modeling (next section). In order to minimize siRNA off-target effects, two to three siRNAs were used per target (Table S4), which were then averaged to purify the effects of the targeted siRNA perturbation. Cells were collected 72 h after siRNA knockdown and washed of Phosphate-Buffer Solution (PBS), and lysed using CelluLyser23. Cell counts were calculated using the resazurin fluorescence assay. Since no endogenous gene can be assumed to be free of MYC regulation, which is thought to be a universal transcriptional amplifier24, a spike-in RNA transcript was added to each sample to act as a reference gene for the quantitative polymerase chain reaction (qPCR) analysis, added in proportion to the cell count before RNA isolation. It consisted of a 1,000-base sequence with a 5′ cap and a polyA tail. This was only used for normalization of mRNA level across samples25. Negative controls were included of siRNA not mapping to human genes, as well as an untreated control absent of any siRNA. The cDNA was prepared from the RNA and preamplified in preparation for the high-throughput qPCR screening. Finally, the transcript profiles with respect to the 40 genes were determined with TaqMan qPCR assays (Table S5) using Fluidigm Biomark 96 × 96 Dynamic Array integrated fluidic circuits. Raw qPCR output was processed with the ddct R package26 into log transformed fold changes relative to the experimental controls. Three experimental replicates were made per targeted perturbation and five outlying replicates were discarded due to clear machine read error, thus the dataset is composed of 40 genes (N) and 115 samples. Including all controls, a total of 18,432 qPCRs were performed on 192 samples. Two technical replicates were performed to ensure minimal machine error. They generated very similar values up to 25 qPCR cycles (Fig. S1).

Experimental validation of individual interactions was performed on GTML2 brain tumor cells, which were cultured in serum-free stem cell medium as previously described27 and treated for 2 h with DMSO or JQ1 (500 nM). RNA was purified using the RNeasy Kit (Qiagen). RNA sequencing was performed using the Ion Proton System for Next-Generation Sequencing at NGI, SciLifeLab, Uppsala Biomedical Center (BMC), Sweden. All treatment conditions were performed in triplicates. All RNA sequence reads were processed and the differentially expressed genes were analyzed as previously described27. An additional validation was based on a gene expression data set comprising DMSO- and JQ1-treated glioma cell lines, which we had previously published28. Specifically, in this study we were able to distinguish between JQ1-resistant and JQ1-sensitive human adult high-grade glioma cell lines. From the four cell lines with available AmpliSeq expression data, only one (U3056) was characterized as JQ1-sensitive and expressing high MYC levels, and was accordingly selected to investigate the interaction between BRD4 and CCNB1. Expression data for the U3056 cell line was downloaded from the Gene Expression Omnibus (GSE138942) and comprised 6 h DMSO and JQ1 treatments in triplicates each.

GRN inference

The fold change is calculated in comparison to the spike-in for all knockdown experiments. It is used in combination with the collective experimental design matrix (describing the location of perturbed and readout genes) to determine the GRN, i.e. the interaction matrix A, of regulatory effects from gene j to i in element aij. We use a linear ODE model, similar to17,29, which simplifies to

$$Y=-{A}^{\dag}(P-F)+E$$
(1)

where Y is an expression matrix of calculated fold changes, with N genes (rows) and M experiments (columns). In Eq. 1, P is the design matrix if we solve for \({A}^{\dag}\), where the Moore–Penrose generalized inverse, denoted , is used throughout in place of the inverse due to computational intractability wherein sparse GRNs might be rank deficient. However as we want to solve for \(A\) and not \({A}^{\dag}\), we reformulate Eq. 1 to a traditional regression problem on errors-in-variables form, \(-(P-F) =A(Y-E)\). The error in Y and P are represented as measurement error E and process error F, respectively, as defined in Table S1. F is used as an estimate of the variation in the perturbation, e.g. siRNA efficiency or environment, while E is used as an estimate of the variation inherent to the cells’ expression as well as error in plate reading26.

Three methods are employed to perform model selection and parameter estimation simultaneously. LSCO (least squares with a cut off to produce variably sparse networks)30 was chosen for its resemblance to the standard ordinary least squares method, LASSO (least absolute shrinkage and selection operator)31 was chosen for its proven ability to find the sparse solution with minimum errors, and TLSCO (total least squares32 with the same sparsity inducing cut off as LSCO) for its ability to model error in both the dependent and independent variables15. Each method is encapsulated within the nested bootstrapping framework to estimate the linear model in an accurate and reproducible manner by limiting false discovery rate (FDR), in their native configuration.

GRN validation without gold standard

To evaluate the goodness-of-fit of the inferred network in a prediction error framework one needs to balance the measurement and process errors. This optimization occurs during the leave one out procedure (Algorithm 1), using the CVX convex optimization package (v1.22)33 for MATLAB, where the left out gene (g) is expressed as a linear combination of the other experiments (cross-validation). The aim of this procedure is to equally balance the measurement and process errors when predicting the left out gene under cross-validation.

figure a

In the BalanceFitError algorithm, A contains the inferred GRN structure (topology), with each non-zero value representing a regulatory interaction and each zero a lack of interaction, i.e. pseudo-direct influence. The algorithm estimates each gene’s perturbation and response based on the balanced measurement and process errors of all other genes and compares it to the intended perturbation and observed response. Since error is a function of the degrees of freedom of the given matrix, relative error (Erel, Frel) is used to more equally balance these errors. From step (i) to (iii) all matrices have the perturbation experiments of the left out gene g removed and are thus denoted !g. However, A maintains all genes, remaining square throughout, and later the left out experiments can be predicted from the remaining data. This method to evaluate the goodness-of-fit is used on all inferred networks. All inference methods used here have a regularization parameter that determines the number of nonzero parameters in the models, which is varied to span the complete range from empty to full network.

Because our leave out procedure assesses individual gene prediction errors, we assembled null GRN performance distributions by shuffling GRN links and fitting these new networks to the data to create both a standardized and fairly conservative link weight using a constrained least squares (CLS) algorithm30,34. To this end we implement a Monte Carlo sampling method, sampling links to maintain the node in degree and preserving hubs thereby approximating an estimated link null distribution based on the inferred GRN judged to generate conservative and fair null GRNs. For a fair comparison, both the inferred and shuffled GRNs are fit to the original data. However topology and sign are preserved in the GRNs. To obtain a measure of the goodness of fit of both inferred and shuffled GRNs, cross-validation was used to calculate the weighted residual sum of squares (wRSS) of the original training data while balancing the measurement and process errors as described in Algorithm 1. We are able to predict a left out gene in step c and d by expressing it as a linear combination of the other genes. This goodness of fit measure was also made of the inferred GRN’s ability to under cross-validation predict the original data compared to the distribution of prediction errors using shuffled data. The relative error metric comparing measured and shuffled wRSS (Figs. 2, 3, S3, S4) is complemented by R2 values (Fig. S5). Before calculating these, each GRN parameters were modified to ensure that the predicted response remained similarly bounded as the observed gene expression. This was done by performing singular value decomposition of the GRN, setting singular values below a cutoff to zero, and then reconstructing the GRN without the smallest singular values. This GRN was then fit to the training data under cross-validation to generate predicted expression responses in the same way as described above. The cutoff on the minimum singular value was set independently for each GRN to ensure that the predicted expression values were within the range of the measured values. The small singular values generally represent noise if the data is ill-conditioned and removing them reduces the effect of noise.

To further verify both predictiveness and generalizability, these GRNs are also applied to a second independent, validation dataset based on the same genes knocked down in pairs, in single replicates. While this data is not used to infer GRNs, we apply the same cross-validation strategy as for the original data to validate the GRNs. This is necessary for parameter fitting, since the process error is different from the single knockdown data. Furthermore, by running the same pipeline we obtain a comparable measure of how well the independent data fits our inferred GRNs, and build null distributions of expected error from shuffled GRNs to examine an inferred GRNs’ ability to predict the data.

Results

A four step procedure was implemented to generate a cancer-centric GRN oriented towards the MYC oncogene (Fig. 1). First, a list of 303 cancer-associated genes, gathered from the NCI Pathway Interaction Database35, the myccancergene.org websiteS636, FunCoup output37, and 29 other sources (supplemental Table )38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66, was ranked heuristically by what was known of each, giving preference to genes with known associations to both cancer and MYC. The criteria were as follows, in decreasing order of rank: (i) members of a complex with MYC affecting or not affecting transcripts, (ii) genes directly affecting MYC or MYC transcripts (activating/repressing), (iii) genes affecting MYC targeted transcripts, and (iv) genes indirectly affecting MYC transcripts. Only genes expressed in the used cell line were further considered. The 40 top ranked genes were perturbed by siRNA in the well characterized human A431 squamous carcinoma cell line (Table S2). Of the selected genes, 31 are transcriptional regulators, 10 are oncogenes, and 7 are tumor suppressors.

Figure 1
figure 1

Workflow of project. siRNA perturbation experiments were carried out systematically per gene, resulting in a change in mRNA level which elicits a regulatory response over time before reaching steady state when gene expression was measured. GRN inference: nested bootstrapping was applied to three inference methods, producing GRNs with an FDR set to 5%. GRN Selection: Estimate goodness of fit with the Balanced Fitting of Errors framework under cross-validation (BFECV) and compare to shuffled topologies (Algorithm 1). Independent Validation: Each inferred GRNs’ ability to predict an independent dataset was evaluated in comparison to a distribution of shuffled topologies. Finally, the overall most predictive GRN was selected. Two novel links were experimentally validated.

RNA silencing experiments were carried out to knock down the expression of each individual gene, whereafter the expression of all genes in response to the perturbation was measured. The experiments were carried out with three biological replicates per gene. At steady-state, gene expression was measured using high-throughput RT-qPCR, totalling 18,432 quantifications. Most targeted genes were seen to have dramatic reduction in expression, generally a stronger effect than for the other genes (Fig. S2). In fact 31 targets were significantly downregulated (p < 0.1 and log2 fold change < − 2).

The perturbation response gene expression data were used for network inference with three methods, LASSO, LSCO, and TLSCO, each run in conjunction with nested bootstrapping. Each GRN inference method was run with varying parameters to produce GRNs in a range of different sparsities. Nested bootstrapping was then used to select the significantly supported links in each GRN, resulting in final sparsities that tend to never exceed 3–5 links/gene even if the natively inferred GRN had 40 links/gene, for example.

In order to select the GRN that has the links most likely to exist in reality, we compared how well each inferred GRN’s topology fits the data compared to shuffled topologies of the same GRN. The reason for validating the topology rather than the complete GRN including the actual parameter values is that those parameters are optimized to fit the data and therefore no suitable null model exists. Note that by topology we mean the structure of the GRN, i.e. the inferred links and their sign. For each topology, the parameters were fit to the observed data using cross-validation, and the error was measured as the difference between predicted and observed values of the hold-out samples, after assembling the individual gene predictions into the full predicted matrix. This was done with the novel Balanced Fitting of Errors with cross validation (BEFCV) algorithm (Algorithm 1), which ensures that the error is balanced between sources, i.e. that the error is not merely pushed from the measurement to the process estimation or vice versa. Note that this algorithm is not a GRN inference method on its own but is rather a way to assess the quality of an inferred GRN.

Each inferred GRN was shuffled a hundred times, and the data was fit the same way to these topologies to estimate a null distribution of expected inference error. Note that since the parameters of the shuffled topologies are fit to minimize the error, this is a very stringent test of the inferred topology and for a suboptimal GRN one expects some of its shuffled topologies to by chance have lower error. Yet, several of the inferred GRNs greatly outperformed their null model, both when using the original training data (Fig. 2) and the independent validation data with double knockdown design in the same cell line (Fig. S3). We also calculated R2 values to show the proportion of the variation that our GRNs explain (Fig. S5). All GRNs are available at https://dcolin.shinyapps.io/CancerGRN/. Five of the inferred topologies had 1,000 times lower error than the median of the shuffled null model. The most accurate GRNs were inferred by Lasso, in terms of outperforming their null distributions. All but one GRN outperformed the median of their null distributions, and eight of the nineteen GRNs across all three methods outperformed all shuffled GRNs in their respective null distribution.

Figure 2
figure 2

Validation of inferred GRNs’ topologies. Each x-axis tick mark shows the prediction performance in terms of the wRSS error of each inferred GRN topology (circles) fit to training data under cross-validation, compared to its shuffled topologies. The box displays the median and interquartile range, and whiskers bound points maximally extending 1.5 times this range. Beyond this, outlier points are shown.

We also applied another null model to test how well the data fits the inferred GRN. Here we shuffled the original data one hundred times and fit these datasets to the inferred GRN in order to generate a null distribution. For many inferred GRNs the error was significantly lower than the median of the null distribution, both for the original training data (Fig. 3) and for independent validation data (Figs. 3 and S4).

Figure 3
figure 3

Validation of the inferred GRNs’ fit to the measured data. Each x-axis tick mark shows the prediction performance in terms of the wRSS of an inferred GRN topology fit to training data under cross-validation, compared to its ability to fit shuffled data. X marks represent the inferred GRNs. The filled color box displays the median and interquartile range, and whiskers bound points maximally extending 1.5 times this range. Beyond this, outlier points are shown.

The GRN that outperformed its null distribution by the largest margin was Bolasso_network_L1145_M115_support97.5_1.52e-03, which we will refer to as the best GRN, with 125 links, including 39 self-links, between 39 genes (Fig. 4) and a sparsity of 3.2 links/gene. The full name indicates certain properties, namely that 1,145 links were natively inferred before NestBoot, 115 experiments were used, 97.5% bootstrap support was attained at FDR = 0.05, and 1.52e-03 is the sparsity penalty parameter used. In this GRN’s overlap plot (Fig. 5) one can see the distributions of bootstrap values for both measured and shuffled data. The frequency of bootstrap support for measured data increases sharply at the right end above 98%, suggesting that this part of the distribution represents real and therefore highly reproducible links. In contrast, the shuffled data decreases towards support = 1. The fact that some links inferred from shuffled data can attain such high bootstrap values can be attributed to the fact that the inference is done at a sparsity that yields very dense GRNs which may result in spurious links with high bootstrap support. However, the nested bootstrap framework monitors the distribution of spurious links and takes them into account when calculating FDR. The plot shows how FDR varies for different bootstrap support cutoffs.

Figure 4
figure 4

The overall best performing GRN. Each of the 125 links has at least 97.5% bootstrap support, and the sparsity is 3.2 links/gene among its 39 genes. The 39 self links are not shown. The genes involved in the external validation experiment, BDR4, CCNB1, and MYC, are highlighted pink. Blue links reflect up regulation while red reflect negative down regulation. The visualization was made by the provided shiny app.

Figure 5
figure 5

NestBoot output for the overall best performing GRN. (A) Shows the entire bootstrap support range from 0 to 1, as well as overlap between all bootstrap GRNs for measured (blue) and shuffled (red) data. The FDR is estimated via a null background model based on networks inferred from shuffled data. This is done to restrict inclusion of false links by setting FDR e.g. to 5%. The dashed orange line represents the cutoff where this is reached, here at 97.5% bootstrap support. The dashed grey line shows how the FDR behaves as a function of the bootstrap support. (B) Shows the fine detail of the curves in the support levels between 0.9 and 1. These visualizations were made by the provided shiny app.

One can also see the level of overlap between one hundred nested bootstrap runs in Fig. 5. Each run yields a bootstrap support for every link, which can be converted to a GRN for a given cutoff. For the measured data, the overlap (Jaccard) between runs stays relatively high (0.6) all the way to links with 100% bootstrap support, indicating that the reproducibility is high. In contrast, for the shuffled data not a single link with bootstrap support above 70% overlaps with another nested run, indicating poor reproducibility despite relatively high bootstrap support.

Validation of the best GRN

Of the 125 links inferred in the top performing GRN, two novel MYC-related links were experimentally validated. The novel regulatory relationships BRD4 → CCNB1 and CCNB1 → MYC (Fig. 3) were examined in an independent study27 in which JQ1 was used to inhibit BRD4 in the GTML2 cell line, a mouse brain tumor cell line that overexpresses human MYCN. The inferred activation of CCNB1 by BRD4 was supported by a significant reduction of CCNB1 expression when BRD4 was inhibited, from 7.12 to 7.04 average log(CPM) after 6 h (Fig. S6). However, in order to study immediate effects of BRD4 inhibition we here performed a new analysis in which GTML2 cells were treated with JQ1 for just 2 h. Again, CCNB1 expression was significantly reduced, from 7.48 to 7.04 average log(CPM). Furthermore, in the MYC-expressing human adult high-grade glioma cell line U305628 we again observed a significant downregulation of CCNB1 after 6 h of BRD4 inhibition via JQ1 (Fig. S7). Longer JQ1 treatment (24 h) further decreases CCNB1 in high-grade glioma cells67. The same was observed 24 h after JQ1 inhibition in ovarian cancer cells68. Additional support for this link is provided by co-expression between BRD4 and CCNB1 in the GEO dataset GSE7307 (Spearman correlation 0.473, p = 4 × 10–39.

Support for the inferred activation of MYC by CCNB1 was found by the fact that the CCNB1 expression changed from normal newborn mouse brain to adult mouse brain (from FPKM 20.3 to 0.2) which agrees with the change of MYC (from FPKM 9.2 to 0.8)27. Additional support for this link is provided by co-expression between CCNB1 and MYC in the GEO datasets GSE2503 (Spearman correlation 0.99, p = 1.4 × 10–24 for squamous cell carcinomas), GSE69925 (Spearman correlation 0.25, p = 1.4 × 10–5 for esophageal squamous cell carcinomas), and GSE7307 (Spearman correlation 0.456 p = 1.2 × 10–37) Furthermore, this link is found in the STRING69, GeneMania70, and Funcoup71 databases.

These validations support a novel mechanism for MYC regulation inferred in the best GRN. While it is well known that BRD4 can activate MYC in some cancer types72, the best GRN presents a regulatory route that goes via CCNB1 (Cyclin B1). Bound with cyclin-dependent protein kinases, CCNB1 is involved in controlling the cell cycle at mitosis. The findings here suggest that CCNB1′s role in regulating biological processes such as proliferation and oncogenesis can proceed via the activation of MYC.

Another type of validation is comparison to known links in public network resources. The links in the best GRN were searched for in the databases TRRUST73, FunCoup71, HumanNe74, and STRING69 as well as in our prior network from data mining. Where these reference networks contained undirected links, we compared them to an undirected version of our GRN. Many known interactions were witnessed in the best GRN (21 recovered from STRING), speaking to its ability to accurately infer what is known. The overlap with the other GRNs was significant (p < 0.1) in a hypergeometric test in all cases but one (Table S3).

Discussion

This study carries out a complete workflow for inferring reliable GRNs, from the selection of genes, experimental perturbation, data collection, and GRN inference, to validation of the inferred GRNs. It was applied to 40 cancer-related genes whose GRN was studied in a human squamous carcinoma cell line. The collected dataset was used to infer variously sparse GRN using three inference techniques within the NestBoot framework. The predictiveness of the inferred GRNs was estimated using the novel BalanceFitError algorithm under cross-validation. This is not a GRN inference method on its own but can be applied to GRNs inferred with any method. Almost all inferred GRNs were more predictive than expected by chance, and some were vastly more predictive. These top performing GRN were also able to predict an independent pairwise-gene perturbation validation dataset significantly better than expected by chance. The best GRN contains many known links as well as proposes many novel links, two of which were verified experimentally.

The performed gene perturbations caused a range of fold changes in both targeted and readout genes (Fig. S2). Knockdown is advantageous for GRN inference compared to complete inhibition through knockout, as that could alter the gene functioning within the cells to such an extent as to potentially drive the cell to any number of non-native states by activating alternative pathways to cope with the loss of the knocked out one. This would result in measurement of an altogether different cellular GRN which lacks the knocked out gene. With knockdown, a gene's effect is lowered in the hope of measuring an otherwise wild-type GRN from the perspective of the single gene perturbation, across the gene repertoire.

The knockdown efficiency of each siRNA is unknown, and varies between genes. It may seem desirable to know the siRNA efficiency since this is a parameter in the perturbation design matrix that is used in the mathematical modelling. However, its value does not affect the inferred GRN’s topology, and since the topology is the main outcome of the inference, and what we compare to null, this lack of knowledge is inconsequential. Prior information, whether literature-curated, ChIP-seq, ATAC-seq has been shown to be of value in modern GRN investigations, and may also be helpful. Such integration could be built into the model as a method of constraining spurious link additions much the same as NestBoot restricts links based on shuffled link distributions. The NestBoot algorithm produces substantial accuracy improvement and we would anticipate further accuracy improvements from the addition of priors. However, such experimental information is not available for this study and we therefore pursued a strictly data driven approach.

Despite our efforts to measure absolute mRNA levels using spiked-in RNA as qPCR reference, MYC was not found to be a universal amplifier as previously claimed24,75,76. Our observation agrees with the results of77. In both their and our study, measurements were done after 72 h. It is possible that MYC knockdown activates a response leading to rapid restoration of MYC expression, so that cells return to their original state within that time span, instead of reaching a new steady state68. In our study, the targeted MYC transcript was not significantly repressed by the MYC siRNA. This may be caused by its unusually high turnover rate78, which can make it difficult to knock down with siRNA79. Another possibility is that the introduced siRNAs compete out miRNAs for available RiSC and thereby relieves repression of endogenous miRNA targets80. The same lack of observed knockdown for the target was noted for three other transcription factors: SP1, LMYC, and JUN. This seems to suggest a need for optimizing the experimental protocol to obtain perturbed steady state conditions when knocking down certain transcription factors.

During the NestBoot procedure the sparsity of the native GRNs is varied from almost a full to almost an empty network. However, as NestBoot selects the strongest supported links only, the sparsity of the GRNs output by NestBoot does not vary much for the denser GRNs, and even less in gene makeup. We observe consistency across different sparsities, i.e. the smaller GRNs are mostly a subset of the larger ones. This consistency among sparsities adds further confidence beyond the GRNs’ predictiveness relative to a null distribution of shuffled topologies. Selecting the GRN with the optimal sparsity can be done in several ways. Here we followed the strategy of selecting the GRN with the best combination of coverage and predictiveness. Another criterion to select GRNs is the biological rationale that natural systems usually contain 3–5 links per gene8,29.

In this study we face the problem of how to measure accuracy in the absence of a true network. Lacking such a gold standard it is impossible to determine if an inferred link is true or false. Instead, we compared each inferred network to a null distribution of GRNs with the same sparsity and indegree distribution. Since the prediction error depends on the weights of the links, it is crucial to fit each shuffled-link GRN to the data to give it reasonable weight estimates. To make the comparison fair, both the inferred GRN and the shuffled-link GRNs are refit to the data. By showing that the inferred GRN outperform their shuffled counterparts in terms of ability to explain the data, measured both by the wRSS and R2, we know that they have a topology closer to the unknown real GRN. The exact same procedure can then be applied to other data, such as the independent validation dataset. With enough repeated shuffled-link GRNs to produce a sufficient null distribution, this results in an unbiased estimate of how predictive a given GRN is compared to what is expected, despite lacking a known gold standard network. Benchmarking on data with a known gold standard shows that increased predictiveness measured this way generally agrees with higher accuracy.