Evaluation of drug-targetable genes by defining modes of abnormality in gene expression

In the post-genomic era, many researchers have taken a systematic approach to identifying abnormal genes associated with various diseases. However, the gold standard has not been established, and most of these abnormalities are difficult to be rehabilitated in real clinical settings. In addition to identifying abnormal genes, for a practical purpose, it is necessary to investigate abnormality diversity. In this context, this study is aimed to demonstrate simply restorable genes as useful drug targets. We devised the concept of “drug targetability” to evaluate several different modes of abnormal genes by predicting events after drug treatment. As a representative example, we applied our method to breast cancer. Computationally, PTPRF, PRKAR2B, MAP4K3, and RICTOR were calculated as highly drug-targetable genes for breast cancer. After knockdown of these top-ranked genes (i.e., high drug targetability) using siRNA, our predictions were validated by cell death and migration assays. Moreover, inhibition of RICTOR or PTPRF was expected to prolong lifespan of breast cancer patients according to patient information annotated in microarray data. We anticipate that our method can be widely applied to elaborate selection of novel drug targets, and, ultimately, to improve the efficacy of disease treatment.

signaling network to represent core-signaling network 15 . In addition, there are several studies or databases using functional cancer genomics, each of which performed massive RNAi screening to identify genes that are required for cell proliferation or viability [16][17][18][19] . These projects systematically examined cancer cell line-specific genetic dependencies onto cell viability or proliferation, but still lacks the way to restore normality. Moreover, cancer shows diverse functional hallmarks other than proliferation, some of which are most deterministic factors compared to normal samples. In this regard, comparisons with normal counterpart are necessary, and various patterns of abnormality in gene expression and regulation should be dissected.
We established the concept of "drug targetability" and applied it to identify highly drug targetable genes in breast cancer. Among many modes of abnormality in gene expression, simply restorable genes should be targeted to recover normal phenotypes. Because most drugs reduce activity or expression of certain genes 20,21 , we defined drug targetability as the degree of similarity with normal samples after inhibition of abnormal genes. To validate our computational predictions, we performed cell death and migration assays following knockdown of top-ranked genes (high drug targetability) using siRNA. Successful results suggest that the method we propose here can be widely applied to elaborate selection of novel drug targets for various diseases.

Results
Overview of the approach. Associated with disease symptoms, many genes have abnormal expressional profiles and transcriptional responses compared to the control. To identify novel drug targets among these genes, we should select those restorable after drug treatment. Since the most frequently used drugs (e.g., monoclonal antibody and chemical inhibitor) reduce the expression or activity of targeted genes, we defined drug targetability to reflect this attribute among several modes of abnormality (Fig. 1A). Similar to our previous report 8 , we considered genes in the same pathway units with transcription factors (TFs) as genes that can modulate transcriptional responses. We evaluated all signaling molecules in pathway units of the UnitPath database and referred to them as pathway genes. We  (B) For a gene involved in a pathway unit (pathway gene), target genes of TFs involved in the same pathway unit were considered to be target genes of the pathway genes. UnitPath database was used for pathway information, and MSigDB was used to retrieve target gene information of TFs. (C) Schematic diagram of the evaluation of P-score in the low region (LP) and that in the high region (HP). According to the level of pathway gene expression (x-axis), we divided the low region (≤ 0.5) and high region (> 0.5), and then compared expression levels of the target genes (y-axis) for each region.
considered target genes of the TFs to be transcriptionally controlled by each pathway gene (target genes of the pathway gene, Fig. 1B). We included 1,191 pathway genes and 10,305 target genes of pathway genes in the analysis. For all pathway gene-target gene pairs, we plotted corresponding expression levels, and then distinguished low regions and high regions according to the pathway gene expression levels: pathway gene with expression level larger than 0.5, high region; otherwise (≤0.5), low region. We compared expression levels of target genes in two groups (e.g., normal and cancer) for each region separately, and calculated the P-score using the P-value as described in the Materials and Methods (Equation (2) and Supplementary  Fig. S1). As shown in the example of Fig. 1C, if one target gene has a significantly higher expression level in cancer than the control (high region, blue), the P-score in the high region (HP) is large. If one target gene has a similar expression level in cancer and normal (low region, yellow), the P-score in the low region (LP) is small. We defined drug targetability to have a large value when the HP is large and the LP is small, as following: Figure 2A presents the defined equation and its corresponding heat map of drug targetability. According to the definition, the center region on the heat map (both LP and HP have very small values) indicates normal genes in terms of transcriptional response. In the B and C regions on the heat map (small LP and large LP), target gene expression was significantly different compared to the control in the high region, but was similar in the low region (Fig. 2B,C). As a result, we can expect that drugs targeting or inhibiting the pathway gene will restore its transcriptional response similarly with a normal phenotype. In the D and E regions on the heat map (large LP and small HP), however, target gene expression was significantly different only in the low region (Fig. 2D,E). For these cases, suppression of the pathway gene is unlikely to restore normality. As shown in these examples, we classified several modes of abnormality by evaluating drug targetability for all pathway gene-target gene pairs. Drug targetability was defined to have a large value when suppression of the gene was expected to restore transcriptional normality.
Evaluation of drug targetability in breast cancer. Among many applicable phenotypes, we first examined drug targetability for breast cancer. For all pathway gene-target gene pairs, we calculated LP and HP, and displayed them using a 2D heat map histogram (Fig. 3A). Even though many pairs existed in the center region with a normal transcriptional response, many other pairs showed high LP or HP, implying an abnormal transcriptional response. Several representative examples are shown in Fig. 3. For the cases of C or D in the heat map, target gene expressions were constitutively high ( Fig. 3C) or low (Fig. 3D) for cancer. Even though these pairs showed abnormalities, ACVR2A or PSEN1 could not be good drug targets because their target genes still showed abnormal expression levels when their expression was low. However, for the cases of B or E in the heat map, target gene expression in cancer was abnormal only in the high region ( Fig. 3B,E). If we inhibit expression of NCK1 or GSTP1 in cancer, we can expect their target gene expression to be similar to the control. Figure 3F shows the frequency distribution of drug targetability for all pathway gene-target gene pairs. To utilize high-ranking pathway genes as novel drug targets, we averaged drug targetability of each pair into a single value. Therefore, one pathway gene has one value for drug targetability. To highlight abnormal pathway gene-target gene pairs, we used the top 5% pairs for each pathway gene in the merging process. Frequency distributions of these merged drug targetability (finally used values) showed a pattern similar to those of each pair (Fig. 3G). The entire list of drug targetability values for each gene is provided in Supplementary Data S1. We also overlaid these values in the cancer pathway from UnitPath to obtain a systematic view of drug targetability. In the whole pathway diagram, we can easily select drug targetable genes ( Supplementary Fig. S2).
To confirm the robustness of this method, we conducted 100 random samplings, and compared drug targetability ranks from whole datasets and those from random samples. The ranks showed statistically significant Pearson correlation, especially in the high rank, indicating the robustness of drug targetability (Fig. 3H).

Experimental validation of drug targetability.
To verify the accuracy of the computational prediction, we sorted all pathway genes according to drug targetability and selected four top-ranking genes, namely, PTPRF, PRKAR2B, MAP4K3, and RICTOR. After knockdown of these genes, we evaluated cell viability in the presence or absence of several anticancer drugs. In both MCF-7 and MDA-MB-231 cells, knockdown of MAP4K3 or RICTOR reduced cell viability and sensitized anticancer drugs (doxorubicin, epirubicin, and 5-fluorouracil)-induced cell death (Fig. 4A). On the other way, we induced cell proliferation by epidermal growth factor (EGF) treatment in serum-free conditions. As a result, knockdown of MAP4K3 or RICTOR neutralized the proliferative effect of EGF in both cell lines (Fig. 4B). These results suggest that MAP4K3 and RICTOR are responsible for proliferation and growth in breast cancer cells.
We also examined cell migration, which is another hallmark of cancer. At 48 h after transfection of each siRNA, the MDA-MD-231 cells were applied to the wound-healing assay as described in the Materials and Methods. Compared to the control, cell migration was remarkably reduced after knockdown of PRKAR2B, MAP4K3, or RICTOR. Especially, MAP4K3 knockdown almost totally blocked cell migration (Fig. 5A). These results were confirmed by quantification, using occupied area and the number of cells in the wounded region (Fig. 5B). To examine transcriptional influence of suggested drug targets, we performed cDNA microarray experiment as direct validation. After knockdown of RICTOR with siRNA, mRNA expression profile was compared to the control. Because our method is aimed to identify pathway genes whose target genes become similar with normal after inhibition, we compared fold changes of normal: cancer (dataset) to those of siRICTOR: control (microarray data). Using 20% as cutoff value in microarray data (fold change, 1.2 < or 0.8> ), they have statistically significant Pearson correlation (R = 0.6836, P-value = 0.0204). These data suggest that inhibition of RICTOR makes expression levels of its target genes similar with normal samples, validating our method.
With successful experimental validation, we performed Kaplan-Meier survival analysis using patient information annotated in each microarray data. First, we divided whole breast cancer dataset into low or high expression levels of specific drug target genes, and then evaluated distant metastasis-free survival (DMFS), relapse-free survival (RFS), and overall survival (OS) for each grouped dataset. As a result, low RICTOR patients showed significantly longer OS (Fig. 6A), and low PTPRF patients showed significantly longer DMFS (Fig. 6B) and RFS (Fig. 6C), respectively. Even though all other cases were not statistically significant, these data suggest that inhibition of those genes is expected to improve clinical outcomes and prolong lifespan of breast cancer patients.

Discussion
Many scientists have attempted to develop effective methods for systematic identification of functional driver genes for various diseases. In our previous research, we also successfully showed that the degree of transcriptional response can highlight deterministic genes for different phenotypes 8 . However, reversal of most of these abnormalities with clinical medication is difficult. For a practical application, identification of abnormal genes has proven to be insufficient. In this context, this study is aimed to identify simply restorable genes as useful drug targets. Extensive selection of effective drug targets is important for targeted therapy in the post-genomic era. To achieve this, we classified and evaluated several different modes of abnormal genes. Practically, fine modulation of gene expression or activity is considerably restricted in an actual clinical setting; most available drugs can only reduce the expression or activity of their target genes 20,21 . Among the many genes with an abnormal transcriptional response, therefore, we must select one that becomes normal when its expression is lowered. We defined this feature as drug targetability, and applied our method to breast cancer samples. We computationally predicted PTPRF, PRKAR2B, MAP4K3, and RICTOR to be responsible for the cancer phenotype, and each of them was experimentally validated by means of cell death or migration assay.
Several previous studies investigated RICTOR and MAP4K3, both of which are our proposed drug targets for breast cancer. RICTOR, a well-known component of the mTOR complex 2, was shown to promote migration and prevent apoptosis in osteosarcoma cells and prostate cancer 22,23 . Goncharova et al. also reported that RICTOR mediates migration in mouse embryonic fibroblasts 24 . MAP4K3 was shown to promote migration and invasion of human non-small cell lung cancer 25 , and to be involved in cell growth through activation of mTOR complex 1 pathway 26 . Despite use of different tissue types or contexts, our experimental results were consistent with these reports in terms of cellular functions.
Owing to high-throughput technology, many researchers have attempted to systematically identify abnormal genes for various diseases. However, only a small number of these genes can be utilized as drug targets. Merely identifying abnormal genes is insufficient. Evaluation of diverse modes of abnormality is required. Our method attempted to predict events after inhibition of each gene by calculating drug targetability. Experimental results and survival analyses confirmed the performance of our method. To verify whether the method we propose here can be widely applied to many other diseases, we examined another case. After comparison between tongue cancer and corresponding normal samples, GNA15, PPP2R3A, LAMB3, HSPA2, and MAP4K3 were identified as five top-ranking genes (Supplementary Data S2). Among them, GNA15 27 , LAMB3 28 , HSPA2 29 , and MAP4K3 25 were previously reported as prognostic markers in several cancer types, indirectly supporting our method. Proposed method is not limited to cancer, but rather is applicable to any case where two different phenotypes are of interest. We anticipate that our approach will eventually improve treatment efficacy of many diseases.

Materials and Methods
Microarray datasets and analysis. We downloaded raw CEL files generated using the Affymetrix Human Genome U133 Plus 2.0 Array (GPL570) platform from the public microarray database Gene Expression Omnibus, and normalized them using the Universal exPression Code method (SCAN.UPC package of Bioconductor) 30 . Samples were grouped by normal and cancer according to the annotations in the original GSM files. For microarray experiment, total RNAs were isolated from MCF-7 cells using an RNeasy Mini Kit (Qiagen, Hilden, Germany), and delivered to commercial microarray service (eBiogen, Seoul, Republic of Korea). The assay was performed using Affymetrix human 2.0 ST array.
Pre-defined pathways and target gene datasets. We used the UnitPath database (unpublished data, www.unitpath.com) whose pathways are well-organized forms and have interactive modularity for computational analysis. Because UnitPath allows users to select one gene of interest and obtain others that influence the selected genes in each pathway, we used all of these pathway units for analysis. In retrieval of each unit, we demarcated it at the TFs as end points. Information after TFs (target genes of each TF) was obtained from MSigDB c3 TFT v4.0. Finally, we mapped all genes, with connections with at least one TF, to their corresponding target genes (Fig. 1B). In total, we included 1,191 pathway genes and 10,305 target genes of pathway genes in the analysis.
Evaluation of statistical significance. To evaluate a score for statistical significance, we defined P-score according to the P-value. The P-score equation has a form as following: P score a 1 2 P value log 10 so that the larger P-value, the smaller P-score. We excluded multiple test corrections because any cutoff thresholds were not used as significance level in calculation of drug targetability. To project P-score to 0.5 when P-value = 0.01, we fixed = a 2 . We also added signs to P-score, so that P-score has a minus quantity if the average expression level in cancer is lower than normal; otherwise, P-score is positive. The P-score equation and its relationship with P-value are shown in Supplementary Fig. S1.
Cell viability assay. WST-1 assays were performed to determine cell viability. MCF-7 and MDA-MB-231 cells were seeded in 24-well plates at a density of 4 × 10 4 cells/well in quadruplicate, and WST-1 reagent (Nalgene, Rochester, NY) was added to each well up to 5% of the media volume. After incubation for 2 h at 37 °C in a 5% CO 2 incubator, the absorbance at 450 nm was measured using a microplate reader (Bio-Rad, Richmond, CA). Cell death was confirmed by staining with 100 nM tetramethylrhodamine ethyl ester (TMRE, Sigma-Aldrich). Ten thousand cells were analyzed using a Caliber flow cytometer.
Wound-healing assay. Adherent MDA-MB-231 cells were scraped from the bottom of the culture surface (six-well plate) using a pipette tip to create a cell-free (wounded) area. Each well was washed with PBS to remove cell debris and then incubated with serum-free media. Several features of cell migration were evaluated using a CellProfiler 31 .