Artificial intelligence guided discovery of a barrier-protective therapy in inflammatory bowel disease

Modeling human diseases as networks simplify complex multi-cellular processes, helps understand patterns in noisy data that humans cannot find, and thereby improves precision in prediction. Using Inflammatory Bowel Disease (IBD) as an example, here we outline an unbiased AI-assisted approach for target identification and validation. A network was built in which clusters of genes are connected by directed edges that highlight asymmetric Boolean relationships. Using machine-learning, a path of continuum states was pinpointed, which most effectively predicted disease outcome. This path was enriched in gene-clusters that maintain the integrity of the gut epithelial barrier. We exploit this insight to prioritize one target, choose appropriate pre-clinical murine models for target validation and design patient-derived organoid models. Potential for treatment efficacy is confirmed in patient-derived organoids using multivariate analyses. This AI-assisted approach identifies a first-in-class gut barrier-protective agent in IBD and predicted Phase-III success of candidate agents.

and low). In this algorithm, first the expression values are sorted from low to high and a rising step function is fitted to the series to identify the threshold. Middle of the step is used as the StepMiner threshold. This threshold is used to convert gene expression values into Boolean values. A noise margin of 2-fold change is applied around the threshold to determine intermediate values, and these values are ignored during Boolean analysis. In a scatter plot, there are four possible quadrants based on Boolean values: (low, low), (low, high), (high, low), (high, high).

Invariant Boolean implication relationships
A Boolean implication relationship is observed if any one of the four possible quadrants or two diagonally opposite quadrants are sparsely populated. Based on this rule, there are six different kinds of Boolean implication relationships. Two of them are symmetric: equivalent (corresponding to the highly positively correlated genes), opposite (corresponding to the highly negatively correlated genes). Four of the Boolean relationships are asymmetric and each corresponds to one sparse quadrant: (low => low), (high => low), (low => high), (high => high). BooleanNet statistics (Equations listed below) is used to assess the sparsity of a quadrant and the significance of the Boolean implication relationships (9,10). Given a pair of genes A and B, four quadrants are ((a00 + a01)/total) multiplied by probability of B low ((a00 + a10)/total) multiplied by total number of samples.
Following equation is used to compute the expected number of samples.
To check whether a quadrant is sparse, a statistical test for (e00 > a00) or ( > ) is performed by computing S00 and p00 using following equations. A quadrant is considered sparse if S00 is high ( > ) and p00 is small.
A threshold of S00 > sthr and p00 < pthr to check sparse quadrant. A Boolean implication relationship is identified when a sparse quadrant is discovered using following equation. A high => B high = ( > ℎ , < ℎ ) Boolean implication A high => B low is found if top-right (a11) quadrant is sparse using following equation.
A high => B low = ( > ℎ , < ℎ ) For each quadrant, a statistic Sij and an error rate pij is computed. Sij > 2.5 and pij < 0.1 are the thresholds used on the BooleanNet statistics to identify Boolean implication relationships (BIRs). False discovery rate is computed by randomly shuffling each gene and computing the ratio of the number of Boolean implication relationship discovered in the randomized dataset and original dataset. For IBD dataset the false discovery rate was less than 0.001.
Boolean Implication analysis looks for invariant relationship across all the different types of samples regardless of the conditions and treatment protocols. Therefore, it does not distinguish the sample types when discovering Boolean implication relationships. We assume that there are fundamental invariant Boolean implication formula that are satisfied by every sample regardless of their type (in this context it is limited to healthy and IBD colonic biopsies including both UC and CD). This means normal, UC and CD samples share the same fundamental relationships.

Preparation and validation of IBD datasets for analysis
Both Peters-2017 and Arijs-2018 dataset were indepndently prepared for Boolean analysis by filtering genes that have reasonable dynamic range of expression values by analyzing the fraction of high and low values identified by the StepMiner algorithm (11). Any probeset or genes that contain less than 5% of high or low values are dropped from the analysis.  (11). Any probe set or genes which contained less than 5% of high or low values were dropped from the analysis.

Generation of Clustered Boolean Implication network
Clustering was performed in the Boolean implication network to dramatically reduce the complexity of the network (Fig S1B). A clustered Boolean implication network (CBIN) was created by clustering nodes in the original BIN by following the equivalent BIRs. One approach is to build connected components in an undirected graph of Boolean equivalences. However, because of noise, the connected components become internally inconsistent e.g., two genes opposite to each other become part of the same connected component. In addition, the size of clusters became unusually big with almost everything in one cluster. To avoid such a situation, we need to break the component by removing the weak links. To identify the weakest links, we first computed a minimum spanning tree for the graph and computed the Jaccard similarity coefficient for every edge in this tree.
Ideally if two members are part of the same cluster, they should share as many connections as possible. If they share less than half of their total individual connections (Jaccard similarity coefficient less than 0.5) the edges are dropped from further analysis. Thus, many weak equivalences were dropped using the above algorithm leaving the clusters internally consistent. We removed all edges that have Jaccard similarity coefficient less than 0.5 and built the connected components with the rest. The connected components were used to cluster the BIN which is converted to the nodes of the CBIN. The distribution of cluster sizes was plotted in a log-log scale to observe the characteristic of the Boolean network (Fig S1C). The cluster sizes were distributed along a straight line in a loglog plot suggesting scale-free properties. The choice of the threshold on the Jaccard similarity coefficient play an important role in determining the size and the number of clusters as well as whether they are internally consistent.
We found that a threshold of 0.5 gave us a reasonable number of clusters and followed a scale-free distribution in the cluster sizes. A bigger threshold such as 0.7 to 0.9 will be very aggressive and reduce the cluster sizes (almost all edges will be dropped). A smaller number such as 0.4 will tend to make a bigger cluster with an unusual distribution of cluster sizes. A new graph was built that connected the individual clusters to each other using Boolean relationships. The link between two clusters (A, B) was established by using the top representative node from A that was connected to most of the members of A and sampling 6 nodes from cluster B and identifying the overwhelming majority of BIRs between the nodes from each cluster.
A CBIN was created using the selected Peters-2017 GSE83687 and Arijs-2018 GSE73661 datasets. Each cluster was associated with healthy or disease samples based on where these gene clusters were highly expressed.
The edges between the clusters represented the Boolean relationships that are color-coded as follows: orange for low => high, dark blue for low => low, green for high => high, red for high => low, light blue for the equivalent and black for the opposite.

Charting Boolean paths
Boolean paths have been explored before to predict the underlying time series events in biological processes such as B cell differentiation (10,13) and early differentiation events in cancer stem cell (6)(7)(8)14). This algorithm is  figure (Fig S1D). For the nodes X and Y with X low => Y low only quadrant #1 is sparse; the other quadrants #0, #2, and #3 are filled with samples ( Fig S1D). Assuming monotonicity in X and Y, the quadrants can be ordered in two possible ways: 0-2-3 and 3-2-0. The path corresponds to 0-2-3 begins with X low and Y low. This is interpreted as X turns on first and then Y turns on along a hypothetical biological path defined by the sample order. Similarly, Y turns off first and then X turns off in the path 3-2-0. A complex path in the Boolean network involves more than one Boolean implication relationship (Fig S1E). Three Boolean implication relationships can be used to group samples into five bins and the bins can be ordered in two possible ways (Fig S1E, forward, reverse). Another example of a path is illustrated in the supplementary figure (Fig S1F).
These paths might represent underlying time-series events as was represented in B cell differentiation using MiDReG algorithm (10). Once the path to be queried is identified, we then ask which end has more healthy or more diseased samples. Based on the orientation of the path (i.e., healthy vs disease end), and the concept that there are time series of events in any biological data, it is hypothesized that our algorithm might identify some of the underlying characteristics of the time series events of disease progression.

Discovery of Paths in Clustered Boolean Implication network
Discovery of paths start with a node that represents the biggest cluster in the CBIN. Since a path of high=>high, high=>low, and low=>low can be used to order samples as shown in supplementary figure S1E, we try to identify paths of this type that intersects the big clusters in the network. We developed a simple, intuitive algorithm that traverses the nodes of the CBIN starting with the biggest cluster and greedily chooses next big cluster connected to the nodes visited in sequence. The emphasis on cluster sizes comes from the fundamental assumption that size determines importance and relevance. Therefore, we start from a big cluster (A1) and identify other clusters that form a chain of low => low. Further, we identify other clusters that are either opposite to A1 or they have high=>low relationship with A1, and the biggest cluster (A2) among these clusters was chosen. In addition, a chain of low=> low relationship from A2 is identified. In each subsequent step, again the biggest cluster among the different choices was greedily chosen. Finally equivalence relationship from each cluster is used to gather more genes in each cluster and the whole path is clustered based on equivalence relationships. Depth-first traversal (DFS) was used to follow the path of low => low where bigger clusters are visited first. The search was performed until a cluster was reached for which there is no low => low relationships. For example, starting with cluster S, the search will return S low => A1 low, A1 low => A2 low, and A2 low => A3 low if A3 doesn't have any low => low relationships. Similarly, a new starting point is considered S2 such that S2 is the biggest cluster X that have either S high => X low or S Opposite X. From cluster S2 another DFS was performed to retrieve the longest possible path of low => low. The search may return S2 low => B1 low, B1 low => B2 low if B2 doesn't have any low => low relationships. In summary, the most prominent Boolean path was discovered by starting with the largest cluster and then exploring edges that connected to the next largest cluster in a greedy manner. This process was repeated to explore paths that connects the big clusters in the network.

Scoring Boolean path for sample order
A score was computed for a specified Boolean path that can be used to order the sample which was consistent with the logical order. To compute the final score, first the genes present in each cluster were normalized and averaged. Gene expression values were normalized according to a modified Z-score approach centered around StepMiner threshold (formula = (expr -SThr)/3*stddev). A weighted linear combination of the averages from the clusters of a Boolean path was used to create a score for each sample. The weights along the path either monotonically increased or decreased to make the sample order consistent with the logical order based on BIR.
The samples were ordered based on the final weighted and linearly combined score. The direction of the path was derived from the connection from a healthy cluster to a disease cluster.

Summary of genes in the clusters
Reactome pathway analysis of each cluster along the top continuum paths was performed to identify the enriched pathways (15). The pathway description was used to summarize at a high-level what kind of biological processes are enriched in a particular cluster.

Assessing the association of IBD signature genes with AMPK subunits
The association between mRNA expression levels of various AMPK subunits and Claudins was tested in a cohort previously reported (1). This cohort included gene expression data from multiple publicly available NCBI-GEO data-series (GSE100833, GSE83550, GSE83687). To investigate the relationship between the mRNA expression levels of selected genes (i.e. PRKAB1 and CLDN2), we applied the Hegemon, "hierarchical exploration of gene expression microarrays on-line" tool (6). The Hegemon software is an upgrade of the BooleanNet software (9), where individual gene-expression arrays, after having been plotted on a two-axis chart based on the expression levels of any two given genes, can be stratified using the StepMiner algorithm (11) and compared for statistically significant differences in expression. We stratified the patient population of the NCBI-GEO discovery dataset in different gene-expression subgroups, based on the mRNA expression levels of various AMPK subunits, and compared the expression of IBD-associated genes between groups.

Generation of heat maps using gene clusters identified by Boolean analysis
To generate the IBD, UC and CD heatmaps (Fig S1G) first a Boolean path was constructed by following the largest clusters in the Boolean Network (1,12). Genes along this path were selected to generate a heatmap that

Identification of Epithelial-Mesenchymal and Inflammation-Fibrosis continuum
Top genes involved with Epithelial-Mesenchymal processes and inflammation-fibrosis processes are chosen from the literature review. Given a list of genes BoNE computes a subgraph of the CBIN graph by identifying clusters that include one or more genes from this list. BoNE then searches for a path in this subgraph as mentioned before with the original CBIN graph. The path identified is used to draw a model of the gene expression timeline ( Fig   S1D). The continuum is identified by computing a score based on the path as described before.

Measurement of classification strength or prediction accuracy
Receiver operating characteristic (ROC) curves were computed by simulating a score based on the ordering of samples that illustrates the diagnostic ability of binary classifier system as its discrimination threshold is varied along with the sample order. The ROC curves were created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings. The area under the curve (often referred to as simply the AUC) is equal to the probability that a classifier will rank a randomly chosen IBD samples higher than a randomly chosen healthy samples. In addition to ROC AUC, other classification metrics such as accuracy ((TP + TN Precision score represents how many selected items are relevant and recall score represents how many relevant items are selected. Fisher exact test is used to examine the significance of the association (contingency) between two different classification systems (one of them can be ground truth as a reference).

AI guided discovery of Boolean paths
A Boolean path is converted to a path score as mentioned above using a linear combination of normalized gene expression values. The strength of classification of healthy and IBD samples using this score is computed by the ROC-AUC measurement. We performed a multivariate regression (Fig 2E) to identify the best Boolean path that predicts normal vs IBD samples in the cohort GSE6731 (4 N, 5 UC, 7 CD). We tested how the path score distinguishes healthy and IBD samples as they are annotated in many other independent colon-derived datasets.

Comparison of Boolean with Bayesian and Differential analyses
We used 133 key driver genes (KDG) predicted by the Bayesian analysis from Peters et al. 2017 (1). We performed standard t-tests adjusted at 1% FDR using the Benjamini-Hochberg Procedure to identify differentially expressed genes between normal and IBD samples in GSE83687 (171 down-and 162 up-regulated in IBD). A gene signature score is computed for each approach (Boolean, Differential, and Bayesian) by using weighted linear combination of normalized Z-scored expression values as mentioned above. Weights -1 and +1 is used for down and up-regulated genes respectively from the differential analysis to get a combined score. The samples were ordered based on the final weighted and linearly combined score. The sample order is evaluated using the sample annotation (Healthy, IBD) by ROC-AUC analysis.

Test and Validation IBD Datasets
Two datasets (GSE83687, GSE73661) were used to build a Boolean implication network, and GSE6731 is used to train Boolean models that distinguish healthy and IBD samples. A Boolean path is selected after machine learning to construct a Boolean model. In this model a path score is computed as mentioned in section "Scoring Boolean path for sample order" that is used to order the samples. The sample order is evaluated using the sample annotation (Healthy, IBD) by ROC-AUC analysis. The Boolean model is tested in several human and mouse datasets, each comprised of a heterogeneous collection of samples (as mentioned in Supplementary Data 1) to demonstrate reproducibility. The dataset contained both pediatric and adult IBD samples and represented both genders equally well. Since, the Boolean model created was based on colon-derived datasets, testing was mostly performed in colon-derived samples selected from each dataset whenever possible. There are several mouse models of IBD available such as DSS (16), TNBS (17)(18)(19), IL10-/- (20,21) and adoptive T-cell transfer models (22). We have collected publicly available gene expression datasets derived from mouse models of IBD (Supplementary Data 1) to test whether human Boolean models perform well in mice. The gene name conversion from human to the mouse is performed using human genome GRCh38.95 ensembl IDs and mapping data exported from ensemble BioMart web-interface.

Experimental Approaches:
Reagents and antibodies

RNA extraction and quantitative-(q) PCR
Total RNA was isolated using the Quick-RNA MicroPrep Kit (Zymo Research, USA) according to the manufacture's instruction. RNA was converted into cDNA using the qScript™ cDNA SuperMix (Quantabio).
The cycle threshold (Ct) of target genes was normalized to 18S rRNA gene. The fold change in the mRNA expression was determined using the 2^-ΔΔCt method. Primers used in qPCR reactions were designed using NCBI Primer Blast software and Roche Universal Probe Library Assay Design software (see Table below).

Immunoblotting
For immunoblotting, protein samples were separated by SDS-PAGE and transferred to PVDF membranes (Millipore, Burlington, MA). Membranes were blocked with PBST supplemented with 5% nonfat milk (or with 5% BSA when probing for phosphorylated proteins) before incubation with primary antibodies. Infrared imaging with two-color detection and band densitometry quantifications were performed using the Odyssey imaging system (Li-Cor, Lincoln, NE). All Odyssey images were processed using ImageJ software (NIH, Bethesda, MD.) and assembled into figure panels using Photoshop and Illustrator software suits (Adobe Inc., San Jose, CA.).

Human subjects
Colonic biopsies used either for IHC studies or as a source of stem cells for organoid culture were obtained from For the purpose of generating healthy adult enteroids, a fresh biopsy was prospectively collected using small forceps from healthy subjects undergoing routine colonoscopy for colon cancer screening at the VA San Diego Healthcare System. For all the deidentified human subjects the information including age, ethnicity, gender, previous history of disease and medication were collected from the chart following the security and privacy rules outlined in the HIPAA (Health Insurance Portability and Accountability Act of 1996) legislation. Written informed consent was obtained from all participants. The study design and the use of human study participants was conducted in accordance to the criteria set by the Declaration of Helsinki.

Murine models
Intestinal crypts were isolated either from the proximal and the mid-colon of WT C57BL/6 or AMPK KO mice; generated from gender-and age- (10 μM, Sigma-Aldrich). The medium was changed every 2-3 d and the enteroids were expanded and frozen in liquid nitrogen. One important caveat to mention is that after extended passage (> 10) of IBD patient-derived enteroids they begin to revert their phenotype to a 'healthy' state (i.e. less disruption of TJs, and higher TEER values). This is likely due to the stress-reducing culture conditions required to propagate the enteroids and therefore it is imperative to use low passage number enteroids when assessing IBD-associated barrier defects.

Preparation of enteroid-derived monolayers (EDMs)
For both murine and human enteroids, polarized EDMs were prepared using a similar protocol outlined below.
Single-cell suspensions from typsinized organoids in 5% conditioned media were added to matrigel diluted in cold PBS (1:30) as done before (28 Images were acquired using a Leica CTR4000 Confocal Microscope with a 63X objective. Z-stack images were obtained by imaging approximately 4-μm thick sections of cells in all channels. Cross-section and maximal projection images were obtained by an automatic layering of individual slices from each Z-stack. Red-Green-Blue (RGB) graphic profiles were created by analyzing the distribution and intensity of pixels of these colors along a chosen line using ImageJ software. All individual images were processed using Image J software and assembled for presentation using Photoshop and Illustrator software (Adobe).

Quantitative (q)PCR analysis of IBD patient samples
Colonic biopsy specimens were collected either from healthy human subjects enrolled for routine colonoscopy or with IBD subjects enrolled for a colonoscopy procedure at the UCSD IBD center. RNA isolation, cDNA preparation, and analysis of transcript levels for PRKAB1 and CLDN2 were done as described above. Results are displayed as mean ± S.E.M. and p-values calculated using a student two-tailed t-test.

Immunohistochemistry of patient colon samples
Formalin-fixed, paraffin-embedded (FFPE) tissue sections of 4 µm thickness were cut and placed on glass slides coated with poly-L-lysine, followed by deparaffinization and hydration. Heat-induced epitope retrieval was

Proteomic analysis of IBD patient samples
The proteomic dataset containing healthy and UC patients was obtained from previously published work (32).
Samples were analyzed for the expression of AMPK subunits and tight junction proteins (occludin). Results are displayed as mean ± S.E.M. and p-values calculated by 2-way ANOVA using Tukey's multiple comparisons test.   was calculated using by scoring stool consistency (0-4), rectal bleeding (0-4), and weight loss (0-4) as previously published (33). Results are displayed as mean ± S.E.M. and p-values calculated by 2-way ANOVA using Tukey's multiple comparisons test. Immunohistochemical analysis was done as described above. Antibodies used for immunostaining; anti-pS245 GIV (1:50, anti-rabbit antibody generated commercially by 21 st Century Biochemicals, and extensively validated previously (24) inflammatory exudates, and the presence of crypt abscesses and scored as done previously (34). Scoring was carried out by two independent pathologists.

Statistical analyses
Statistical significance between datasets with three or more experimental groups was determined using one-way (or two-way in the case of DSS weight analysis) analysis of variance (ANOVA) including a Tukey's test for multiple comparisons. Statistical difference between two experimental groups was determined using a two-tailed unpaired t-test or two-tailed Mann-Whitney test (patient sample transcript analysis). For analysis of the frequency of SPS-pathway activation in human patient biopsies, a two-tailed Fisher's exact test was used to calculate significance. For all tests, a p-value of 0.05 was used as the cutoff to determine significance. All experiments were repeated a least three times, and p-values are indicated in each figure. All statistical analysis was performed using GraphPad prism 8.

Supplementary Text
Not applicable.  Fisher exact test (two-sided) is performed on a 2x2 contingency table based on the prediction. (B) Detailed classification report is provided in terms of heatmap of the sample ordering, precision (TP/(TP+FP)), recall (TP/(TP+FN)), and f1-score (2 * (precision * recall)/(precision + recall)) for all the datasets.
Overlap of up-and down-regulated genes are illustrated using two Venn diagrams.  Fig S5. Protein-protein interaction network reveals that the stress-polarity signaling (SPS)-pathway is a creative element within that network: Protein-protein interaction (PPI) network built using STRING software (https://string-db.org/) shows the major modules and inter-module links between pathogen-sensing pathways (left most) to epithelial cell-cell adhesions (top right). A stress polarity signaling (SPS) pathway which involves the phosphorylation of the polarity scaffold, Girdin (GRDN), by the metabolic kinase AMPK (of which PRKAB1 is a subunit) has been described as both necessary and sufficient for the strengthening of epithelial junctions under bioenergetic stress (24). Because this event is triggered exclusively as a stress response and helps connect distinct modules of PPI, it fulfills the criteria of "creative elements" within this network (35).  Table showing details of SNP reported in patients with IBD. a The minor allele in the European cohort was chosen to be the reference allele. b Phenotype with the largest MANTRA Bayes factor. c The preferred phenotype (Ulcerative Colitis, Crohn's Disease or IBD (i.e. both)) from our likelihood modeling approach to classify loci according to their relative strength of association. IBD_S and IBD_U refer to the IBD saturated and IBD unsaturated models, respectively (36). d MANTRA log 10 Bayes Factor. e Hetrogeneity I 2 percentage. (B) Genome browser view of PRKAB1 locus showing that the IBD-associated SNP in PRKAB1 is localized to non-coding regions. (C) Box plots showing analysis of publically available IBD transcriptomic datasets (1) identifying PRKAB1 as the only AMPK subunit dysregulated in UC and CD patients. The box in the box plot shows the interquartile range (IQR, distance between lower and upper quartile). From above the upper quartile, a distance of 1.5 times the IQR is measured out and a whisker is drawn up to the largest observed point from the dataset that falls within this distance. Similarly, a distance of 1.5 times the IQR is measured out below the lower quartile and a whisker is drawn up to the lower observed point from the dataset that falls within this distance. All other observed points are plotted as outliers. Center of the circle represent the average value and the length of arrows up/down represent 95% confidence interval. P values are computed using standard two-sided unequal variance (Welch) t-test. (D) Bar graphs and scatter plots showing PRKAB1 down regulation in IBD correlates with upregulation of CLND2 (leaky TJ protein) and down regulation of PARD3 (essential polarity protein). (E) Scatter plots showing PRKAB1 and CLDN2 dysregulation in IBD were confirmed using an independent transcriptomic dataset. Data are shown as mean ± S.E.M. and one-way ANOVA using Tukey's multiple comparisons test and p ≤ 0.05 cutoff was used to determine significance;('*/**/***/***' represents p-values where; *; p ≤ 0.05, **; p ≤ 0.01, ***; p ≤ 0.001, ****; p ≤ 0.0001). Fig S7. Gene expression analysis confirms that low levels of PRKAB1 correlate with a higher degree of leakiness of the epithelial barrier (CLDN2), proinflammatory cytokines (MCP1, IL8, IL6 and TNFα), and a signature for non-response to Infliximab (TNFAIP6, S100A8,, S100A9, IL-11, and GOS2(37)): Healthy and IBD patients were divided into two groups with high vs. low PRKAB1 mRNA levels using a statistically determined cut-off (StepMiner threshold; (11)). Box plots show the expression of various transcripts analyzed between the two groups in (A) pooled IBD patient samples, (B) Healthy Controls, (C) Ulcerative Colitis patients, and (D) Crohn's Disease patients. The box in the box plot shows the interquartile range (IQR, distance between lower and upper quartile). From above the upper quartile, a distance of 1.5 times the IQR is measured out and a whisker is drawn up to the largest observed point from the dataset that falls within this distance. Similarly, a distance of 1.5 times the IQR is measured out below the lower quartile and a whisker is drawn up to the lower observed point from the dataset that falls within this distance. All other observed points are plotted as outliers. Center of the circle represent the average value and the length of arrows up/down represent 95% confidence interval. P values are computed using standard two-sided unequal variance (Welch) t-test. subunit is expressed in all three tissues with the highest expression in skeletal muscle (blue arrows). The box in the box plot shows the interquartile range (IQR, distance between lower and upper quartile). From above the upper quartile, a distance of 1.5 times the IQR is measured out and a whisker is drawn up to the largest observed point from the dataset that falls within this distance. Similarly, a distance of 1.5 times the IQR is measured out below the lower quartile and a whisker is drawn up to the lower observed point from the dataset that falls within this distance. All other observed points are plotted as outliers. (B) Bar graph summarizing protein expression data determined by IHC on colon biopsies. PRKAB1 is highly expressed in the colon (green arrow) compared to liver and skeletal muscle (red arrows) compared to PRKAB2, which shows no preferential expression between tissues (blue arrow). (C) Representative tissue IHC images used for protein expression analysis. All data curated from Human Protein Atlas (www.proteinatlas.org).  MDCK cells were grown on glass cover slips, exposed or not to energetic stress (exactly as done previously (24) via glucose deprivation for 120 min) and subsequently fixed and stained for phospho (p) GIV (magenta) and tubulin (green) and analyzed by confocal microscopy. Representative confocal images are shown. Scale bar = 10 µm.  (G) Representative images of colon tissue stained with H&E, or immunostained for activation of SPS-pathway (pS245 GIV) or upregulation of Claudin-2. Tissues were also stained to assess goblet cells loss (PAS staining) and fibrotic collagen deposition (Trichrome stain). All scale bars are 200 µm. All data are shown as mean ± S.E.M. and one (C, E, F) or two-way (B, D) ANOVA using Tukey's multiple comparisons test and p ≤ 0.05 cutoff was used to determine significance.

Fig S15. Treatment with PRKAB1-agonist (PF) is an independent indicator of response (increase in TEER) in IBD patientderived organoids.
Multivariate analysis models the TEER measurement before the treatment as a linear combination of TEER measurement after treatment, age, gender (Female:0, Male:1), race (African American:0, Asian:1, Caucasian:2, Hispanic:3, Middle Eastern:4), treatment history (Naive:0, Infliximab:1, Adalimumab:2, Adalimumab + Infliximab:4, Vedolizumab:3, Vedolizumab + Adalimumab:5). Coefficient of each variables (at the center) with 95% confidence intervals (as error bars) and the p-values were illustrated in the bar plot for (A-C). The p-value for each term tests the null hypothesis that the coefficient is equal to zero (no effect).  Table 2 for details on cohort composition.   S18. PRKAB1-agonists are predicted to augment healthy gut barrier function; combination with anti-inflammatory agents is predicted to show therapeutic synergy: PRKAB1 and a few other selected targets currently approved, or in development for treating IBD are superimposed on Boolean Network path model. Targets either do not appear (Red), and unable to be assessed (Blue), or reside on the 'disease' end (Green) of IBD progression spectrum. PRKAB1 is positioned on the 'healthy' end of the network, where pharmacologic augmentation is predicted to restore or protect mucosal homeostasis and barrier function. Combination of PRKAB1agonists (enhancers of the healthy network) with anti-inflammatory agents (suppressors of diseased genes) is predicted to show therapeutic synergy.  Table 3]. Fisher exact test (two-sided) is performed on a 2x2 contingency table based on the prediction. (B) ROC (receiver operating characteristic) curve and ROC-AUC (area under the curve) is presented in the plot for all three computational approaches: Differential, Bayesian and Boolean. (C) Boolean Equivalent relationships (in GSE83687, using BooleanNet statistical test (9)) between FDA-approved and abandoned targets. FDA-approved targets tend to implicate each other using the Boolean Equivalent relationship.
Tables (Uploaded individually as Word Document Tables) Table 1: Genes that share a strong Boolean implication relationship with PRKAB1 on the major continuum paths in IBD (related to Fig 3C-D).

Supplementary Data 2.
Gene clusters on the IBD map (Fig 2) and the corresponding Reactome pathway analyses of each cluster. (Fig S16) and the corresponding Reactome pathway analyses of each cluster. (Fig S17) and the corresponding Reactome pathway analyses of each cluster.