Computational Prediction and Validation of BAHD1 as a Novel Molecule for Ulcerative Colitis

Ulcerative colitis (UC) is a common inflammatory bowel disease (IBD) producing intestinal inflammation and tissue damage. The precise aetiology of UC remains unknown. In this study, we applied a rank-based expression profile comparative algorithm, gene set enrichment analysis (GSEA), to evaluate the expression profiles of UC patients and small interfering RNA (siRNA)-perturbed cells to predict proteins that might be essential in UC from publicly available expression profiles. We used quantitative PCR (qPCR) to characterize the expression levels of those genes predicted to be the most important for UC in dextran sodium sulphate (DSS)-induced colitic mice. We found that bromo-adjacent homology domain (BAHD1), a novel heterochromatinization factor in vertebrates, was the most downregulated gene. We further validated a potential role of BAHD1 as a regulatory factor for inflammation through the TNF signalling pathway in vitro. Our findings indicate that computational approaches leveraging public gene expression data can be used to infer potential genes or proteins for diseases, and BAHD1 might act as an indispensable factor in regulating the cellular inflammatory response in UC.

precisely select genes to study. GSEA is an analytical method that uses gene sets representing different biological processes to interpret gene expression data, producing a score that measures the similarity between two different processes by comparing their expression profiles 10 . This method has been used in drug discovery with gene sets responsible for diseases to interpret the expression data from different drugs 11,12 . In this study, we estimated the similarity between public expression data derived from UC patients and that from siRNA perturbed cells by applying the parameter 'insensitive' and the GSEA systematic algorithm 13 to estimate the distance between UC and different siRNA perturbations. We also verified the predicted gene expression changes in a mouse model of acute DSS-induced colitis, a canonical IBD model. Our results revealed that the protein BAHD1 is downregulated the most in the colon tissue of the mouse model. As it has not been previously described to have efficacy for UC or any related disorder of inflammation in the gastrointestinal tract, we evaluated the efficacy of BAHD1 in UC and explored its effect on the human epithelial colorectal adenocarcinoma cell line Caco-2 exposed to inflammatory mediators and its related molecular mechanism.

Results
Computational Prediction and Assessment of a Novel Target in UC: BAHD1. We calculated the distance values between UC and siRNA perturbation expression profiles based on GSEA. The siRNA perturbation expression profiles represented cell responses to the silencing of as many as 106 different genes (see Methods). By examining the relationships between UC and the 106 siRNAs based on predicted distance scores, we identified major clusters of the entire data set [Fig. 1]. A low distance value indicates a similar gene regulation tendency, and the distance values between host responses to UC and those to siRNA perturbations are shown in Supplementary Table S1. We picked out the top five genes whose single siRNA perturbations had the lowest distance values relative to UC, namely, EZH2, UPF1, FOXM1, NUDT6, BAHD1 (distance value = 0.868, 0.878, 0.883, 0.885, 0.891, respectively), and we investigated the mRNA levels of these most likely candidate genes in the DSS-induced colitis mouse model by qPCR. Among them, BAHD1 was found to be the most downregulated in the mouse model compared with the control group [ Fig. 2A], suggesting that this protein might be essential in this inflammatory bowel disease. An enrichment plot showing the enrichment score (ES) for the gene list rank for UC and BAHD1 is shown in Fig. 2B.

Decreased BAHD1 Expression in in vivo and in vitro Models and UC Patients.
To characterize BAHD1 expression in colon tissue, intestinal samples from healthy humans were evaluated by using immunohistochemistry (IHC). As shown in Fig. 3A, BAHD1 protein was expressed normally in the healthy human large intestine, including crypt and epithelial cells, mainly in the nucleus. As for the protein's expression in colitis, we found that BAHD1 was significantly decreased in IECs and crypt cells in the large intestine of UC patients compared with control patients who had no history of intestinal inflammation [ Fig. 3B], indicating that dysregulated expression of BAHD1 in the intestine may be associated with regions of active disease in UC. Western blotting using in vivo and vitro models supported this inference: Caco-2 cells showed a significantly reduced level of BAHD1 protein in a cell model, in which Caco-2 cells were exposed to inductive factors for 24 h (see Methods). A similar observation was made in the mouse model of acute colitis (see Methods) in comparison with the control group [ Fig. 3C]. Therefore, we next explored BAHD1's functions and unveiled its possible molecular mechanisms in the cell model, which might give some hints regarding the development of UC.

Associated Inflammatory Mediators were Enhanced in BAHD1-deficient Caco-2 Cell
Model. To explore the relationship between our predicted protein BAHD1 and the responses of IECs in an inflammatory microenvironment, we used the Caco-2 cell line exposed to several inflammatory mediators to establish a cell model to mimic gut inflammation for IECs (see Methods). Caco-2 cells were treated with siBAHD1 for 48 h before 24 h of exposure. The cells were collected for mRNA extraction, and the cell culture supernatant was reserved to measure cytokine secretion. Notably, the cell model pre-treated with siBAHD1 displayed increased mRNA and protein expression of associated cytokines, including pro-inflammatory cytokines such as TNF-α , IL-6, IL-1β , IFN-β and IFN-γ [ Fig. 4A,B] and chemokines such as IL-8, CCL3, CCL4, CCL5, CX3CL1 and CXCL10 [ Fig. 4B,C]. In addition, certain cell adhesion molecules, including immunoglobulin superfamily intercellular adhesion molecule 1 (ICAM-1) and vascular cell adhesion molecule 1 (VCAM-1), displayed similar increases in the siBAHD1 group [ Fig. 4D], and they are important factors mediating leukocyte migration and local inflammation in IBD 14 . However, although most cytokines showed a marked increase in the Caco-2 cell model and even in the siBAHD1 group without any stimulus (like IFN-β and CX3CL1), with decreased expression of BAHD1, several chemokines such as CXCL3 and CXCL5 did not display a similar trend [ Fig. 4E]. As for other types of inflammatory mediators, cyclooxygenase-2 (COX-2), an enzyme that catalyses the inflammatory response factor prostaglandin, had higher expression in the cell model pre-treated with siBAHD1, as did two isoforms of nitric oxide (NO) synthases, inducible NOS (iNOS) and endothelial NOS (eNOS) [Fig. 4F].
Scientific RepoRts | 5:12227 | DOi: 10.1038/srep12227 Activation of Ikappa B (IκB) Kinase (IKK)/NF-κ B and JNK/AP-1 Pathways in the siBAHD1-treated Caco-2 Cell model. In response to stimulation by the various factors described above, activity through two important inflammation pathways, the IKK/NF-κ B and JNK/AP-1 pathways, was measured by using relevant key phospho-specific antibodies to investigate the underlying molecular mechanism of BAHD1 deficiency in the cell model. Activation was comparable in the stimulation only group and the siBAHD1 group. In vitro, siBAHD1 in Caco-2 cells resulted in stronger phosphorylation of IKK α /β , Iκ Bα and NF-κ B subunit p65 in the IKK/NF-κ B pathway. Moreover, the AP-1 protein c-JUN and upstream regulator JNK showed higher phosphorylation levels in the siRNA-treated group [ Fig. 5A]. Regarding other MAPK pathways active during the acute phase in cell injury, we did not observe activation of either the ERK1/2 pathway or the p38 pathway in the siBAHD1 group [ Fig. 5B]. By contrast, STAT3 phosphorylation was clearly increased, indicating that there might some connection between BAHD1 and the Janus kinase/signal transducer and activator of transcription (JAK/STAT) pathway [ Fig. 5B]. These results were in accordance with the enhanced levels of cytokines in the siRNA-treated group, indicating that BAHD1 might be associated with cytokine expression in intestinal cells through a certain inflammatory pathway.
An NF-κ B inhibitor Blocked the Effects caused by BAHD1 Repression in Caco-2 Cell model. We had found that IKK and Iκ Bα were activated in the cell model system; therefore, we wondered whether its downstream target, NF-κ B, participated in the regulation of cytokine gene expression. To confirm that the NF-κ B pathway was involved in the regulation by BAHD1 of inflammatory gene expression in the cell model, we used an NF-κ B inhibitor, parthenolide (PTN, 30 μ m), which is a sesquiterpene lactone that inhibits activation of the NF-κ B pathway 15 . It significantly inhibited the stimulus-induced activation of the NF-κ B pathway and repressed the expression of associated inflammatory genes [ Fig. 6A,C]. After treatment with siRNA for 48 h, Caco-2 cells were incubated with PTN for 1 hour before the addition of the mixture of various stimulators described above. The phosphorylation levels of IKKα /β , Iκ Bα and NF-κ B p65 were reduced significantly in both the NC and siBAHD1 groups [ Fig. 6B], which was consistent with a decrease in cytokines such as TNF-α , IL-6, IL-8, CCL3 [ Fig. 6C]. Therefore, the results suggested that cytokines were highly activated mainly through the NF-κ B pathway.

BAHD1 Differentially Modulated the TNF Signalling Pathway by Altering TNFR1
Expression. Downregulation of BAHD1 correlated positively with cytokine secretion; therefore, we turned our attention to the starting point of this secretory process. The NF-κ B and AP-1 pathways, which are critical for expression of the proinflammatory cytokine cascade, might be mediated by tumour necrosis factor receptor 1 (TNFR1) 16 . Consistent with the high activation of key proteins in intracellular cell signalling pathways, we detected that the expression of TNFR1 increased dramatically in the siBAHD1 group, at both the protein and mRNA levels, compared with the negative control group [ Fig. 7A,B]. High levels of cytokines secreted by colonic epithelium that had lost BAHD1 expression might in turn result in persistent activation of the NF-κ B and JNK/AP-1 pathways. Taken together, these results suggest an essential role of BAHD1 in the negative regulation of the starting point of the pathway through the TNF signalling pathway.

Discussion
Here, we used a systematic computational approach based on publicly available gene expression signatures to predict multiple previously undescribed molecules for UC 10 . Distance values derived from . We can infer that the top-(or bottom-) ranked genes in UC are strongly positively (or negatively) regulated in BAHD1, resulting in a high enrichment score for UC's signature in BAHD1's PRL.
comparing public UC microarray data against a compendium of gene expression signatures comprising 106 siRNAs were evaluated in the datasets based on GSEA. All siRNAs were manually filtered according to their quality from a whole range of data in the GEO platform. A small distance indicates a similar gene regulation tendency, and among the smallest-scoring siRNAs predicted from our approach were EZH2, UPF1, FOXM1, NUDT6 and BAHD1.
The exact functions of these five factors in IBD have not been previously reported. EZH2 (enhancer of zeste homolog 2) is a histone methyltransferase associated with transcriptional repression, and its overexpression promotes tumour development 17 . UPF1 (up-frameshift mutant 1) is the regulator of nonsense transcripts 1 in humans and participates in both nuclear mRNA export and mRNA surveillance 18 . The third one, FOXM1, is a transcription factor involved in cell cycle progression that regulates the expression of a large number of G2/M-specific genes 19 , while NUDT6 (nucleoside diphosphate-linked moiety X motif 6) is thought to be a fibroblast growth factor antisense gene associated with cell cycle progression and tumour proliferation 20 . Further study into the potential genes driving this clustering could reveal new information regarding UC's pathogenesis and new molecular therapeutic directions.
Among them, BAHD1 is involved in gene silencing 21 and was the most downregulated in the UC mouse model, which was inferred to indicate the most potential as a regulatory protein for the disease. The experimental validation we performed in vitro confirmed that the loss of BAHD1 activated various  and NO synthases, including iNOS and eNOS, showed higher expression in the cell model pre-treated with siBAHD1. All data above are shown as the mean ± SEM from three independent measurements. Statistical significance was determined by Student's t-test (*P < 0.05; **P < 0.01). The western blots shown are representative of at least three independent experiments. cytokines during a cellular immune response through associated signalling pathways. Intestinal inflammation and tissue damage is a direct result of increased circulating inflammatory cytokines, which are secreted at sites of inflammation and impact during the onset, progression, and resolution of UC 22 . Those cytokines, and also COX-2, iNOS and eNOS, are mediated by several signalling pathways 23,24 .
Transcription factors, including NF-κ B and AP-1, play critical roles in the expression of genes involved in inflammation and carcinoma development in the gastrointestinal tract 25,26 . In the present study, we showed that key proteins in the NF-κ B pathway, including IKK α /β , Iκ Bα and NF-κ B subunit p65 27,28 , were activated to a higher level in stimulated Caco-2 cells with BAHD1 knocked down compared with a purely stimulated group. AP-1 is a member of a family of transcription factors mainly belonging to the JUN and Fos families whose activation is involved in inflammatory gene expression 29 . A similar phenomenon was observed in the JNK/AP-1 pathway, in which the phosphorylation levels of JNK and c-JUN increased. Taken together, the data gave the strongest hint that a link might exist between activation of the transcription factors NF-κ B and AP-1 and the reduction in BAHD1 expression in IECs.
Pathogen-associated molecular patterns are sensed by specific receptors, which in turn activate signalling cascades to induce the synthesis of inflammatory mediators such as TNF, IL-1 and IFN 30 . TNFR1, which is ubiquitously expressed, has pleiotropic functions related to cell immunity, survival, apoptosis and necrosis and can be activated via both membrane-bound and soluble TNF 31,32 . The TNF receptor is primarily responsible for initiating inflammatory responses by mediating TNF-α -induced NF-κ B activation 33,34 . In this study, TNFR1 transcription increased significantly in Caco-2 cells after downregulation of BAHD1, causing TNF signalling pathway activation in IECs during inflammatory mediator exposure. As a result, more cytotoxic inflammatory factors were produced in response to the activation of the pathway, which in turn resulted in inflammatory status aggravation with more secreted cytokines, especially TNF combined with TNFR1. The continuous excessive cytokine secretion caused injury and dysfunction of the IECs. A hypothesis of BAHD1 negatively regulating the TNF signalling pathway by altering TNFR1 expression is shown in Fig. 7C, in which the inflammatory microenvironment induces downregulation of BAHD1 in IECs, which in turn can increase the production of various cytokines through the IKK/NF-κ B and JNK/AP-1 pathway.
As a novel heterochromatinization factor in vertebrates, BAHD1 participates in gene silencing by promoting the formation of heterochromatin through interaction with HP1, MBD1, HDAC5 and several transcription factors to control cell differentiation and maintenance of homeostasis 21 . We speculate that in some way, BAHD1 connects with other repressive core complex factors to mediate TNFR1 gene silencing.
As for the MAPK pathways, the JNK pathway and the p38 MAPK pathway regulate apoptotic cell death, whereas ERK1/2 acts as a prosurvival factor that contributes to the regulation of cell proliferation and differentiation 35 . Therefore, JNK activation in IECs with downregulated BAHD1 levels during the acute phase of injury may result directly in apoptosis, not survival.
The JAK/STAT pathway is a central mediator of the responses of various extracellular cytokines and has been implicated in the pathogenesis of many human immunity disorders, including IBD. JAK inhibitors have the potential to treat the inflammation associated with colitis 36,37 . The detailed mechanism whereby repression of BAHD1 in enteric cells leads to higher levels of STAT3 phosphorylation requires further exploration.
Although we found some interesting results, it should be noted that follow-up investigative work will be necessary. First, in vivo evidence regarding the precise role of BAHD1 in UC is lacking. In addition, although exposure of Caco-2 cells to inflammatory stimuli is a common approach to investigating inflammatory signalling, it is not the disease per se. Finally, the detailed mechanism of how BAHD1 represses TNF receptor expression is unclear and will require deep exploration.
In conclusion, the results raise the intriguing possibility that a computationally predicted protein, BAHD1, might act as an important regulator of the classical inflammation pathways in UC, and there might be a functional association between intestinal cell inflammatory responses and BAHD1. Additionally, the computational approaches we used based on public gene expression databases show potential for the discovery of genes and proteins that might be vital factors in the pathogenesis of certain diseases.

Methods
Human Tissue. Human paraffin-embedded colonic mucosa pinches were obtained from surgical patients with active UC or healthy adjacent or distant colonic tissue from subjects with certain cancers. Pathological analysis verified the diagnoses of UC. This study was approved by the Ethics Committee of the First Affiliated Hospital, College of Medicine, Zhejiang University, Hangzhou, China. Informed consent was obtained from each subject before the study. The study protocols and the consent form were administered in accordance with the approved guidelines of the Ethics Committee.
Animal Treatment. Six-to eight-week-old littermate female C57BL/6 mice were purchased from Zhejiang Experimental Animal Centre, Hangzhou, China. Mice were housed in an animal room with air-conditioned specific pathogen-free (SPF) conditions at 23 ± 2 °C with a 12 h light/dark cycle, and they were acclimated for 7 days before experimentation. The Animal Care and Use Committee of Zhejiang University approved all the mouse studies, which were performed in accordance with the Chinese guidelines for the care and use of laboratory animals.

Computation of Distances between Different Expression Data. UC patient tissue expres-
sion profiles and siRNA-perturbed cell expression profiles were collected from the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) 38 . To make full use of collected expression data, the data platforms were restricted to the Affymetrix Human Genome U133A Array (GPL96) and the Affymetrix Human Genome U133 Plus 2.0 Array (GPL570), the most widely used Homo sapiens platform. Any probes not numbered in all the datasets were excluded in our next analysis, resulting in 22,215 validated probes. Expression profiles representing different siRNA permutations were collected by searching in the GEO with the key words siRNA, shRNA and by manual checking. Due to the dearth of siRNA perturbation expression profiles, the cell types used were not restricted to particular cell types [Supplementary Table S3].
To properly compare expression profiles from different datasets, the validated probes were ranked by their expression change compared to the control as described below.
First, we paired each experiment's (UC patient samples or siRNA processed cells) expression profile to a control (healthy controls or untreated cells).
Second, for each pair of samples, we ranked the probes considering both their fold-changes and their absolute values relative to a probe rank list (PRL) 11 . The detailed steps are list below.
(1) Expression values less than the primary threshold value (the lower quartile of expression values of these two samples) were set to that value. (2) The probes were then ranked in descending order of corresponding experiment-to-control ratio values. (3) For probes where the ratio value equals one, the secondary threshold value (one-tenth of the primary threshold value) was used to reset the value of these probes. (4) These probes were sub-sorted in descending order of the new ratios to produce the final probe rank list for each pair of samples.
This hierarchical sort strategy avoids the inappropriately high or low ranks that can be caused by large fold changes resulting from dividing by small values. Third, PRLs representing the same permutation (same disease or same gene silencing) were combined with the R package GeneExpressionSignature 39 into final PRLs to represent the cell's responses to them according to a hierarchical majority-voting scheme 13,40 .
Then, the distance between UC and the different siRNA perturbations was estimated by GSEA 10 in the following steps: (1) The distances were calculated by comparing two PRLs through investigating whether the top-(or bottom-) ranked probes in one PRL was also the top-(or bottom-) ranked in the other PRL. Thus, for each PRL, a gene set containing its top-(or bottom-) ranked probes was generated as its signature. The top and bottom 250 probes were selected as the signature of each PRL. The sizes of the signatures were changeable, and their effect on the final prediction was limited; therefore, the size was set to an empirical value from a reference paper 41 .
(2) When comparing two PRLs A and B, the enrichment score of A's (or B's) signature in reference PRL B (or A) was calculated by equal weighted GSEA 10,11 , and the enrichment score could be presented as ES AB (or ES BA ) ranging from − 1 to 1. A high enrichment score indicates that the signature genes also tend to appear at the top or bottom of the reference PRL, indicating similarity between them.  Gene Expression Measure. RNAiso Plus (Takara, Otsu, Japan) was used to isolate mRNA from cells or tissue. The PrimeScript RT Master Mix (Takara, Otsu, Japan) was used to generate cDNA. qPCR was performed using a SYBR Green Premix DimerEraser (Takara, Otsu, Japan) on the 7900HT Fast Real-Time PCR system (Applied Biosystems). All reactions had a melting curve with a single peak. The cycle threshold (CT) values for the triplicate samples were averaged and the data were analyzed using the Δ Δ CT method, where fold change = 2 −ΔΔCT . All data analyzed were normalized to beta-Actin or GAPDH expression. For the specific primer sequences (Sangon Biotech, Shanghai, China), see Supplementary Table S4.

Establishment of DSS-induced Colitis in Mice.
Chemically induced murine models of intestinal inflammation are the most commonly used and best-described models for investigating the pathophysiological mechanisms and immunological processes underlying chronic mucosal inflammation 42 . The protocol to develop acute colitis in C57BL/6 mice was as described previously 43 . The drinking supply of the mouse cages was filled with DSS (MW: 36,000-50,000; MP Biochemicals, Solon, OH, USA) at 3% weight/volume, while control mice received autoclaved water for 7 days. Mice receiving DSS orally developed acute UC-like clinical and pathological manifestations. The mice developed colonic mucosal inflammation limited to the mucosa and contained large numbers of immunoglobulin-secreting plasma cells, accompanied by body weight loss, bloody diarrhoea during the acute phase, and shortening of the colon. In accord with these manifestations, the inflamed colon tissue secreted much more TNF-α , IL-6 and IFN-γ .  Table S5] were the two main standards that identified the successful establishment of the UC-like mouse model.
Statistical Analysis. All data are shown as the mean ± SEM or SD value from at least three independent experiments. Significant differences were evaluated by the unpaired Student's t-test with two-tailed distributions. P-values below 0.05 were considered significant. The results were considered significant at *P < 0.05; **P < 0.01. Prism version 5.0 (Graph Pad Software) was used to perform the statistical analyses.