Expression of Nicotinamide Phosphoribosyltransferase-Influenced Genes Predicts Recurrence-Free Survival in Lung and Breast Cancers

Nicotinamide phosphoribosyltransferase (NAMPT) is a rate-limiting enzyme in the salvage pathway of nicotinamide adenine dinucleotide biosynthesis. NAMPT protein is a secreted plasma biomarker in inflammation and in cancer. The NAMPT enzymatic inhibitor, FK866, acts as an inducer of apoptosis and is a cancer therapeutic candidate, however, little is known regarding the influence of NAMPT on cancer biological mechanisms or on the prognosis of human cancers. We interrogated known microarray data sets to define NAMPT knockdown-influenced gene expression to demonstrate that reduced NAMPT expression strongly dysregulates cancer biology signaling pathways. Comparisons of gene expression datasets of four cancer types generated a N39 molecular signature exhibiting consistent dysregulated expression in multiple cancer tissues. The N39 signature provides a significant and independent prognostic tool of human recurrence-free survival in lung and breast cancers. Despite the absence of clear elucidation of molecular mechanisms, this study validates NAMPT as a novel “oncogene” with a central role in carcinogenesis. Furthermore, the N39 signature provides a potentially useful tool for prediction of recurrence-free survival in lung and breast cancer and validates NAMPT as a novel and effective therapeutic target in cancer.

We conducted meta-analysis of genome-wide expression data to identify NAMPT-influenced genes implicated in cancer pathobiology. First, we identified differentially expressed genes utilizing microarray data from two independent human cell lines (primary and cancer cells) and wild-type (WT) cells and NAMPT knock down (KD) cells. These differentially-expressed genes were denoted as NAMPT-influenced genes with gene ontology analysis indicating enriched cancer-related pathways. Second, a prognostic gene signature derived from the NAMPT-influenced genes was developed and expression was compared in normal and colon, lung, pancreatic, and thyroid cancers. Thirty-nine NAMPT-influenced genes were identified as being commonly differentially expressed in tumor tissues and comprised a multi-molecular cancer outcome predictor.
Our studies indicate this molecular signature effectively predicts recurrence-free survival in lung and breast cancer in a manner independent of standard clinical and pathological prognostic factors.

Results
NAMPT-influenced genes. We compared the gene expression pattern between wild type and NAMPT-silenced human endothelial cell and a breast cancer cell line to identify genes potentially regulated by NAMPT. Two independent microarray datasets containing gene expression information for both wild type and NAMPT-silenced cells were collected from the Gene Expression Omnibus (GEO) database 12 : one dataset was derived from a MCF-7 breast cancer cell line (GSE13449) 13 and the second dataset was from human pulmonary microvascular endothelial cells (GSE34512) 14 . The genes differentially expressed between WT and NAMPT silenced cells in both datasets with accordant direction were retained as NAMPT-influenced genes. At the specified significance level of false discovery rate (FDR) , 5% and fold change .1.1 (see Methods for details), 462 genes were found to be commonly differentially expressed between WT and NAMPT silenced cells, among which 361 genes were up-regulated while 101 genes were down-regulated in NAMPT silenced cells (Supplementary Table  S1). We next searched the enriched Kyoto Encyclopedia of Genes and Genomes (KEGG) 15 physiological pathways among the dysregulated genes revealing genes enriched in cancer-related KEGG terms, such as ''Pathways in cancer'', ''Colorectal cancer'', ''Melanogenesis'', ''Renal cell carcinoma'', and ''Apoptosis'' (Figure 1, Fisher's exact test). These findings suggested that the NAMPT-influenced genes are involved in human cancer pathology.
To determine the depth of involvement of NAMPT-influenced genes in human cancers, we explored expression differences of these genes between normal and tumor tissues from lung (GSE18842) 16 , colon (GSE23878) 17 , pancreatic (GSE15471) 18 , and thyroid (GSE33630) cancers. Paired normal and tumor tissues from 44 lung, 19 colon, 36 pancreatic, and 44 thyroid cancer patients were included. Paired t-test was used to detect the differentially-expressed genes between the normal and tumor tissues. In total, 39 genes were identified as mutually differentially expressed and concordant in expression with the NAMPT-silenced model (P , 0.05 after Benjamini-Hochberg adjustment) in at least three out of four cancer types: lung cancer (Fig. 2), colon cancer ( Supplementary Fig. S1), pancreatic cancer ( Supplementary Fig. S2), and thyroid cancer ( Supplementary Fig. S3). We designated this NAMPT-influenced 39-gene set as the N39 gene signature (Table 1).
N39 predicts recurrence-free survival in lung and breast cancers. We hypothesized that the N39 signature would be predictive of tumor outcome in lung and breast cancer patients. We constructed a scoring system to assign each patient a risk score, representing a linear combination of the N39 gene expression values weighted by the coefficients obtained from training cohorts (GSE8894 19 for lung cancer and GSE2034 20 for breast cancer) (see Methods for details). N39-positive patients were defined as those having risk scores greater than the group median. As expected, there was a significantly reduced recurrence-free survival for N39-positive patients in the training cohorts (Supplementary Figure S4 and Table 2).
We tested the ability of the N39 based risk score to classify patients into prognostic groups in independent validation cohorts. For each cancer type, two validation cohorts were collected: Lung1 (GSE31210) 21 and Lung2 (GSE37745) 22 for lung cancer, and Breast1 (GSE25066) 23 and Breast2 (GSE21653) 24 for breast cancer. Kaplan-Meier survival curves demonstrated a significantly reduced recurrence-free survival for N39-positive patients in the validation cohorts (log-rank test: P 5 5.4 3 10 25 for Lung1; P 5 0.011 for Lung2; P 5 2.9 3 10 25 for Breast1; and P 5 7.2 3 10 24 for Breast2) (Figure 3). Univariate Cox proportional hazards regression indicated that N39-positive patients exhibited significantly increased risk for recurrence (fold increase or FI) in these 4 cohorts: 2.88-FI for Lung1, 2.08-FI for Lung2, 2.27-FI for Breast1, and 2.12-FI for Breast2 (Table 2). These findings collectively indicate that N39 is predictive of recurrence-free survival in lung and breast cancer.
In a recent computational study, 47 published breast cancer prognostic signatures were compared with signatures comprised of randomly selected genes. Approximately 60% of the published signatures were not significantly improved over random signatures of identical size with the majority of random gene signatures significantly associated with breast cancer outcome 25 . We performed a resampling test to determine whether the prognostic power of N39 was significantly better than random gene signatures. We constructed 1,000 random gene signatures of identical size as N39 (39 genes) with Cox proportional hazards regression of survival conducted for each resampled gene signature. The association between each random gene signature and recurrence-free survival was measured by the Wald statistic, the ratio of Cox regression coefficient to its standard error. Our alternative hypothesis was that the Wald statistic value of N39 should be higher than that of randomized gene signatures if N39 was more predictive than randomized signatures. Figure 4 indicates that the Wald statistic of N39 was significantly higher than that of randomized gene signatures (Right-tailed: P 5 0.026 for Lung1; P 5 0.020 for Lung2; P 5 0.009 for Breast1; and P 5 0.011 for Breast2), suggesting that the null hypothesis that the asso-  (Table 3).
In the Lung2 and Breast2 cohorts, N39 status was the only significant covariate in the multivariate model (Table 3). However, in the Lung1 cohort, patient age, stage, and EGFR/KRAS/ALK alteration status were also significant variables. Therefore, we further stratified the patients in the Lung1 cohort according to respective significant factors and redid Cox proportional hazards regression. For patients aged , 60 and $60, N39-positive patients had significant increased risk for recurrence, 2.62-FI (P 5 0.038) and 2.57-FI (P 5 0.005), respectively. For patients with stage I cancer (Lung1 cohort only includes patients with stage I and II lung cancer), N39-positive patients exhibited significantly increased risk for recurrence (2.48-FI, P 5 0.012), however, no significant difference was observed between N39-positive and -negative groups for patients with stage II lung cancer. For patients without and with EGFR/KRAS/ALK alteration, N39-positive patients had a 2.35-FI (P 5 0.041) and 2.36-FI (P 5 0.015) increased risk for recurrence, respectively. We also checked the performance of the N39 signature in patients without and with smoking history respectively and found N39-positive patients exhibited increased risk for recurrence in never-smokers and ever-smokers, 2.72-FI (P 5 0.012) and 2.19-FI (P 5 0.034) respectively. Kaplan-Meier survival curves demonstrated significantly reduced survival for N39-positive patients in each subset grouped by age, stage, EGFR/KRAS/ALK alteration status, and smoking history, with the exception of patients with stage II lung cancer (Fig. 5A), presumably reflecting the reduced sample size.
In the Breast1 cohort, lymph node status, tumor size, and ER status were significant clinicopathological factors in addition to N39 status (Table 3). We stratified patients in the Breast1 cohort according to these factors. For patients with and without lymph node involvement, N39-positive patients exhibited significantly increased risk for recurrence, 8.03-FI (P 5 0.006) and 2.09-FI (P 5 6.1 3 10 24 ), respectively. For patients with tumor size ,T3 and $T3, N39positive patients displayed significant increased risk for recurrence, 2.56-FI (P 5 0.002) and 1.69-FI (P 5 0.044), respectively. For patients with ER negative status, N39-positive patients had a marginally increased risk for recurrence (1.59-FI, P 5 0.057), while for the ER positive group, N39-positive patients exhibited significantly increased risk for recurrence, 2.7-FI (P 5 0.004). Breast cancer is strongly related to age with ,80% of breast cancer occurring in women age .50. We demonstrated that N39-positive women age ,50 exhibit a 1.9-FI (P 5 0.020) whereas women age .50 exhibit a 2.64-FI increased risk for recurrence (P 5 8.4 3 10 24 ). Kaplan-Meier survival curves confirmed a significantly reduced survival for N39positive patients in each subset grouped by age, lymph node status, tumor size, and ER status ( Figure 5B).

Discussion
NAMPT is a novel cancer marker 6 and therapeutic target 26 with unclear mechanisms of action. Regardless of intrinsic complex bio-logically function with differential roles as secreted proinflammatory cytokine (extracellular NAMPT) or rate-limiting NAD 1 synthesis enzyme (intracellular NAMPT), we looked into the prognostic power with the gene sets regulated by NAMPT. Firstly we confirmed the critical role of NAMPT in carcinogenesis by the gene ontology analysis of all NAMPT-mediated genes: eight of the eleven significantly deregulated pathways are direct cancer pathways (Fig. 1). Secondly, we generated the N39 signature by filtering through gene express data sets of four cancer types. Thirdly, we validated N39 signature as a powerful tool to provide important prognostic lung and breast cancer and determined the N39 gene signature as a significant and independent predictor of cancer recurrence-free survival.
We chose lung and breast cancers to serve as the validation study for cancer survival prognosis, mainly dependent on the availability of the datasets (three independent studies to serve as one discovery cohort and two validation cohorts). Moreover, this choice of cancer type selection is based on the severity of the two types of cancer. Lung cancer is the most frequently diagnosed cancers and leading cause of cancer death in males, comprising 17% of the total new cancer cases and 23% of the total cancer deaths 27 . In females, breast cancer is the most frequently diagnosed cancer and the leading cause of cancer death, accounting for 23% of the total cancer cases and 14% of the cancer deaths 27 .
Prognostic molecular signatures that work cooperatively with traditional clinical and pathological factors may increase prognostic accuracy when identifying patients at higher risk for recurrence and death [28][29][30] . Our proposed molecular signature that is composed of 39 NAMPT-mediated genes is a promising prognostic marker, because N39 was solely developed based on the discovery cohort and its prognostic power was validated in two independent validation cohorts for lung and breast cancer, respectively. More importantly, N39 was independent of other clinicopathological covariates. In the Lung1 cohort, when grouped by age, EGFR/KRAS/ALK alteration status, and smoking history, N39 further stratified lung cancer patients with significant differences in survival. A significantly increased risk of recurrence was also observed in N39-positive patients of stage I. However, we failed to observe a significant difference between N39-positive and -negative groups among the patients of stage II, which may be due to the relatively smaller sample size in this category. To validate the prognostic power of N39 in stage II tumor, we further included an additional lung cancer dataset (GSE41271) 31 here. We merged the subjects of stage II from three independent cohorts (Lung1, Lung2, and GSE41271) using the ''metaArray'' package in Bioconductor (see Supplementary Fig. S5 for details). We found that N39-positive patients (with stage II tumor) exhibited a significantly increased risk (1.68-FI, P 5 0.049 by univariate Cox proportional hazards regression) for recurrence comparing with N39-negative patients. Also, Kaplan-Meier survival curves confirmed a significantly reduced survival (P 5 0.047 by logrank test) for N39-positive patients of stage II (Supplementary Fig.  S5). In the Breast1 cohort, we stratified the patients according to age, lymph node status, tumor size, and ER status, respectively. A significantly increased risk of recurrence was also observed in N39-positive patients in each category, except for the marginal signal in ER nega-    tive patients. Taken together, these results confirm that N39 is not dependent on specific values of the respective covariates status, which enhances the identification of cancer patients at greater risk for recurrence. We used the median of N39 risk score as a cutoff to stratify patents into two groups (N39-positive and -negative) to conduct categorized statistical analyses (such as Kaplan-Meier analysis and log-rank test).
Clinically, zero can be utilized as an absolute cutoff to divide patients into high-and low-risk groups, as the median of N39 score is approximately equal to zero in each validation cohort ( Supplementary  Fig. S6).
In addition to its prognosis utility, N39 gene list also provides a set of NAMPT associated genes that might play critical roles in cancer pathogenesis. One good example is SIRT1. NAMPT-SIRT1-MYC www.nature.com/scientificreports axis critically regulates cell survival 32 . SIRT1 is also found overexpressed in many cancers and frequently NAMPT is concurrently over-expressed with SIRT1, which is important for prostate cancer cell survival and stress response 33 . A recent study in pancreatic cancer lines, however, suggested that NADase CD38 but not SIRT1 is crucial for pancreatic cancer cells' response to NAMPT inhibition 34 , suggesting the complex interaction of NAMPT with SIRT1. These previous findings, together with N39 signature, have generated novel biomarkers or therapeutic targets in cancer. This study is an example of re-examination of available genomic/ genetic data in the ''big data'' era, with a novel translational approach. NAMPT is confirmed to be a novel ''oncogene'' with a central role in carcinogenesis despite a clear molecular mechanism elucidated. In addition to cancer prognosis, now well validated in the current study, N39 has promise in the management of multiple cancers.

Methods
Expression microarray data. We obtained the gene expression data of WT and NAMPT KD MCF-7 breast cancer cells (GSE13449) 13 and of WT and NAMPT KD pulmonary microvascular endothelial cells (GSE34512) 14 from the NCBI GEO database 12 . The gene expression data of paired normal and tumor tissues for lung (GSE18842) 16 , colon (GSE23878) 17 , pancreatic (GSE15471) 18 , and thyroid (GSE33630) cancers were also collected from the GEO database. Training and validation cohorts were constructed for lung and breast cancers. From the GEO database, we collected the expression datasets with available information on recurrence-free survival for lung (GSE8894 19 for training and GSE31210 21 and GSE37745 22 for validation) and breast (GSE2034 20 for training and GSE25066 23 and GSE21653 24 for validation) cancers.
Microarray data processing. The GC robust multichip average algorithm 35 was used to summarize the expression level of each probe set for the microarray data of WT and NAMPT KD human cells and of paired normal and tumor tissues. Only the probe sets present (determined by function ''mas5calls'' in the Bioconductor ''affy'' package 36 ) in at least two thirds of the samples were retained. We further limited our analysis to the probe sets with unique annotations and removed genes on chromosomes X and Y to avoid potential confounding factors. Significance analysis of microarrays 37 , implemented in the samr library of the R Statistical Package, was used to compare log 2 -transformed gene expression levels between WT and NAMPT KD human cells. FDR was controlled using the q-value method 38,39 . Transcripts with a fold-change greater than 1.1 and FDR less than 0.05 were deemed differentially expressed.
Risk scoring system. For each training cohort, univariate Cox proportional hazards regression was used to evaluate the association between recurrence-free survival and gene expression. A risk score was then calculated for each patient using a linear combination of gene expression weighted by the Wald statistic (ratio of regression coefficient to its standard error) as shown below: In the equation above, S is the risk score of patient; n is the number of genes; Z i denotes the Wald statistic of gene i; e i denotes the expression level of gene i; and m i and t i are the mean and standard deviation of the gene expression values for gene i across all samples, respectively. Patients were then divided into positive and negative groups with the median of the risk score as the threshold. A higher risk score implies a poor outcome. The scoring system and the associated scaling coefficients were fixed based on the training cohorts and then evaluated in the validation cohorts. All statistical analyses were conducted using the R platform (version 2.15.1). The a level for all the statistical tests was 0.05.