NETISCE: a network-based tool for cell fate reprogramming

The search for effective therapeutic targets in fields like regenerative medicine and cancer research has generated interest in cell fate reprogramming. This cellular reprogramming paradigm can drive cells to a desired target state from any initial state. However, methods for identifying reprogramming targets remain limited for biological systems that lack large sets of experimental data or a dynamical characterization. We present NETISCE, a novel computational tool for identifying cell fate reprogramming targets in static networks. In combination with machine learning algorithms, NETISCE estimates the attractor landscape and predicts reprogramming targets using signal flow analysis and feedback vertex set control, respectively. Through validations in studies of cell fate reprogramming from developmental, stem cell, and cancer biology, we show that NETISCE can predict previously identified cell fate reprogramming targets and identify potentially novel combinations of targets. NETISCE extends cell fate reprogramming studies to larger-scale biological networks without the need for full model parameterization and can be implemented by experimental and computational biologists to identify parts of a biological system relevant to the desired reprogramming task.

The raw data files where the tables and figures were generated, and the code used to produce them, can be found in the GitHub repository (https://github.com/veraliconaresearchgroup/netisce) and the NETISCE manual website (https://veraliconaresearchgroup.github.io/Netisce/).

Overview of Network-Based and Dynamical Systems-Based Approaches to Cell Fate reprogramming
The growth of computational methods for constructing biological interaction networks (e.g., gene regulatory and signaling networks) from genome-wide expression data 1-3 has led to the development of cell reprogramming methods within the framework of network biology. Networkbased approaches such as CellNet and ANANSE use gene expression profiles to construct gene regulatory networks (GRNs) and evaluate reprogramming experiments 4,5 . These algorithms score cellular reprogramming experiments by analyzing the extent to which a reprogrammed cell establishes the desired phenotype's GRN and suggest transcription factors to induce the desired reprogramming. For example, CellNet was used to determine the reprogramming efficiency of fibroblasts to hepatocyte-like cells (iHeps) by comparing GRNs of fibroblasts, hepatocytes, and the reprogrammed fibroblasts 6 . CellNet analysis revealed that the reprogrammed cells did not exhibit the hepatocyte cell identity and suggested that downregulation of the transcription factor Cdx2 could drive these cells towards the hepatocyte-like state.
However, a significant limitation of static network-based approaches is that cell reprogramming does not necessarily arise only from the network topology. A systems dynamics contain essential information for cellular reprogramming that is impossible to capture in a purely static view of cell phenotypes 7 . The dynamical systems framework can be used to derive a quantitative understanding of cellular reprogramming [8][9][10] , where each cell fate is a stable state (attractor) shaped by the architecture and the dynamics of its regulatory network 10-13 (Supplementary Figure   1a). Analyzing a mathematical model's phase space of a system is an established dynamical systems-based framework for studying cell fate differentiation, tumorigenesis, and ultimately, the reversion of these processes [14][15][16][17][18][19] . For example, phase space analysis of an Ordinary Differential Equations model (ODE) of pancreatic cell fate regulation revealed branching points in pancreatic cell differentiation 18 . Then, transcription factor perturbations were simulated to study their effect on pancreatic cell differentiation trajectories in the system. These simulations uncovered that the mechanism behind experimentally-observed exocrine pancreatic cell to beta-cell reprogramming required triggering the cell to pass through one of the branching points. Additional analysis revealed that sequentially perturbating the relevant transcription factors could improve reprogramming efficiency.
In discrete modeling frameworks, such as Boolean networks modeling, the phase space can be studied by analyzing the attractor landscape (Supplementary Figure 1b) Alluding to the Boolean network's framework, the attractor landscape of a system represents all potential states of a system and its attractors. For a given attractor, its basin of attraction is the set of initial conditions leading to that attractor, as represented by the three different sections in the figure.
The attractors and their basins of attraction can be associated with cell phenotypes.

Control in ODE and Boolean Models
In these application examples, we compare the ability of NETISCE to correctly simulate perturbations on FVS control nodes to guide a system from an undesired initial state to the desired state for two in silico studies of cell fate reprogramming using the FVS control in Boolean and ODE models of Drosophila embryonic patterning.
To assess the FVS control approach, Zanudo et al. 22  characterized by continuous concentrations, whose rate of change is described by ordinary differential equations (ODE) involving Hill functions for gene regulation and mass action kinetics for protein-level processes and using 48 kinetic parameters. From Zanudo et al. 22 , we obtained the initial conditions that produce the wild-type attractor, the unpatterned attractor, and the precise combination of perturbations on the 48 FVS nodes that shifts the unpatterned initial state from the trajectory of the unpatterned attractor towards the wild-type patterned attractor (Supplementary Table 1). In addition, we considered as internal-marker nodes those used by Zanudo et al. to characterize the wild-type, unpatterned, and the controlled unpatterned-to-wild-type attractors: wg1(representing the wingless gene in the first cell) and en2 (engrailed in the second cell). We tested the ability of NETISCE to correctly identify that the perturbations on FVS control nodes can shift the unpatterned initial state away from the unpatterned attractor and towards the wildtype attractor. Therefore, we modified NETISCE to only consider the perturbation on FVS control nodes specified by Zanudo et al. Otherwise, NETISCE was run with default settings.

Supplementary
K-means clustering was performed on the attractors generated from the wild-type and unpatterned initial states and the 100,000 randomly generated initial states. According to the elbow metric, the k-means optimality metrics returned an optimal k of k=3, and k=2 by the silhouette metric. NETISCE proceeded with k-means clustering with k=2. One cluster contained the attractors generated from the wild-type initial states, and one cluster contained the attractors generated from the unpatterned initial states. The specified perturbation on FVS control nodes passed both filtering criteria and thus successfully shifted from the unpatterned to wild-type cell fate.
The Boolean model of Albert   We again tested the ability of NETISCE to correctly identify that the perturbations on FVS control nodes can shift the unpatterned initial state away from the unpatterned attractor and towards the wild-type attractor. Like the ODE model, the k-means optimality metrics returned an optimal k of k=3, according to the elbow metric, and k=2 by the silhouette metric. NETISCE proceeded with k-means clustering with k=2, where one cluster contained the attractors generated from the wildtype initial state, and one cluster contained the attractors generated from the unpatterned initial state. The specified perturbation on FVS control nodes passed both filtering criteria and thus successfully shifted from the unpatterned to wild-type cell fate. These studies show that our attractor landscape estimation and SFA simulation framework in NETISCE successfully reproduces the in silico cell fate reprogramming studies using the FVS control in ODE and Boolean modeling frameworks. Table 3: Steady-state values for the study in Ascidian embryos' cell fate specification. The steady-state values produced from SFA simulations for the seven internalmarker nodes in the attractor from an unperturbed simulation and the attractors generated from the seven perturbations on the FVS control nodes that were experimentally verified in Kobayashi et al. 25 Table of the fifteen perturbations on FVS control nodes that passed both filtering criteria for induced pluripotent stem cell reprogramming from primed to naïve pluripotency simulations when using seven internal-marker nodes. The first column contains the perturbation ID for the fifteen perturbations out of the 729 total possible perturbations. Columns two to seven contain the specified perturbation to each FVS control node: "up" denotes overexpression, "down" denotes knockouts, and blank cells represent no change to the node's activity. The last three columns contain information about the number of overexpression and knockout perturbations and the total number of nodes perturbed in the considered FVS set. Supplementary Figure 3: Bar plots for each FVS control node in the fifteen perturbations that passed both filtering criteria for induced pluripotent stem cell reprogramming from primed to naïve pluripotency simulations when seven internal-marker nodes are used for Criterion 2. These plots describe the percentages of overexpression knockout perturbations or no change to activity for an FVS control node across the fifteen perturbation sets. The x-axis contains the denoted perturbation (down=knockout, up=overexpression, or no change). The y-axis denotes the percentage of knockout, overexpression perturbations, or no change to that node across all fifteen sets of perturbations on FVS control nodes.

Feature Importance Analysis in iPSC Reprogramming Machine Learning Classification
NETISCE employs three machine learning classification algorithms to assess whether a given set of perturbations on an FVS set does shift the system from a given initial state to the desired cluster of attractors associated with the desired phenotype. We investigated how each algorithm classifies these perturbation sets on FVS control nodes by performing a Feature Importance

Validation of Colorectal Cancer Network and SFA Simulation
In this study of adaptive resistance to MAPK inhibitor therapy in CRC, the ultimate therapeutic goal is to decrease proliferation and increase apoptosis in tumor cells. In a method adapted from Beal et al. 28  First, we used the Molecular Signatures Database 29 gene sets for "Apoptosis" and "G2M Checkpoint" (proliferation-related genes) to identify 12 genes related to apoptosis and ten genes related to proliferation within the CRC signaling network to be used for the apoptosis and proliferation signatures. Normalized gene expression data for 592 CRC tumors were obtained from The Cancer Genome Atlas 30 . The attractors were estimated with SFA for each tumor sample, initializing the network with the tumor's data profile. Gene Set Variation Analysis 31 was used for each patient tumor to compute apoptosis and proliferation signature scores for the normalized gene expression data and the attractor estimated from the expression and mutational data.
Finally, the Spearman Correlation is calculated between the patient data signature score and the attractor signature score for apoptosis and proliferation signatures. The correlation coefficient for the proliferation and apoptosis signature scores was used to evaluate if the network and SFA simulations maintained the appropriate proliferation and apoptotic signatures for each CRC tumor.
A strong correlation coefficient indicated that the network and simulations preserved the appropriate signatures and could be used to perform perturbations on FVS control nodes. The signature scores from the patient data and the associated estimated attractors were strongly correlated for apoptosis signature nodes (R=0.76) and proliferation signature nodes (R=0.81) (Supplementary Figure 4b). We concluded that the network and simulation framework could simulate control node perturbations in the system.

Supplementary Figure 4: Validation of Colorectal Cancer (CRC) Signaling Network and SFA
simulation. a Overview of the network and simulation validation method, adapted from PROFILE 28 . Using CRC patient tumor data, the underlying concept is that each CRC tumor has specific values for genes related to apoptosis and proliferation based on their normalized expression data and mutational profiles. Our goal is to verify that the proliferation and apoptosis signatures of CRC tumors are preserved under the SFA simulation of the generic CRC network for individual cell lines or tumors. We used the Molecular Signatures Database 29 to identify apoptosis and proliferation-related genes within the CRC signaling network. The corresponding values of these signature genes are used as a score for the apoptosis and proliferation signatures. Starting from patient tumor gene expression data from The Cancer Genome Atlas (TCGA), signature scores for the proliferation and apoptosis-related genes within the network are computed for each patient using Gene Set Variation Analysis 31 (bottom branch). Next, using the network and SFA, an attractor is estimated for each patient, and the apoptosis and proliferation signature scores at the attractor state are computed (top branch). The spearman correlation is calculated between the patient data signature score and the attractor signature score for apoptosis and proliferation signatures. b Spearman correlation plots for the apoptosis and proliferation signature scores. The correlation coefficients indicate that the network and simulations preserved the signatures of apoptosis and proliferation.

Supplementary Figure 5:
Bar plots for each FVS control node in the 1,266 perturbations that passed both filtering criteria in the colorectal cancer adaptive resistance reversion simulations when twenty internal-marker nodes are used for Criterion 2. These plots describe the percentages of overexpression and knockout perturbations or no changes to activity for an FVS control node across the 1,266 perturbation sets. The x-axis contains the denoted perturbation (down=knockout, up=overexpression, or no change). The y-axis denotes the % of knockout, overexpression, or no change to that node across all fifteen sets of perturbations on FVS control nodes.

Feature Importance Analysis in CRC MAPKi Resistance Reprogramming Machine Learning Classification
We analyzed the top 10 percent of ranked features for the three machine learning classification algorithms (Supplementary Table 7 Figure 7: Reprogramming Pancreatic Exocrine Cells to Beta-Cells. a In this study, we use NETISCE to identify PFVS that shift pancreatic exocrine cells to beta-cells. We initialize the system with gene expression data from exocrine cells (x 0 ) and simulate combinations of perturbations on the FVS control nodes to shift the system to a state (yellow arrow leading to x′ 0 ) that leads to beta-cells (magenta circle). b Pancreatic cell fate differentiation GRN introduced in Zhou et al. 18 . Blue nodes are control nodes, and magenta nodes are internal-marker nodes. c Key nodes for the cell fate reprogramming. Presented here is the FVS control node set within the network. Four internal-marker nodes are used to identify targets to reprogram the system towards the beta-cell fate. The phenotypes associated with the internal-marker nodes are denoted in the second column.

Reprogramming Pancreatic Exocrine Cells to Beta-Cells in NETISCE
The