A response prediction model for taxane, cisplatin, and 5-fluorouracil chemotherapy in hypopharyngeal carcinoma

Head and neck squamous cell carcinoma (HNSCC) is the sixth most common cancer worldwide. The five-year survival rate of HNSCC has not improved even with major technological advancements in surgery and chemotherapy. Currently, docetaxel, cisplatin, and 5-fluoruracil (TPF) treatment has been the most popular chemotherapy method for HNSCC; but only a small percentage of HNSCC patients exhibit a good response to TPF treatment. Unfortunately, at present, no reasonably effective prediction model exists to assist clinicians with patient treatment. For this reason, patients have no other alternative but to risk neoadjuvant chemotherapy in order to determine their response to TPF. In this study, we analyzed the gene expression profile in TPF-sensitive and non-sensitive patient samples. We identified a gene expression signature between these two groups. We further chose 10 genes and trained a support vector machine (SVM) model. This model has 88.3% sensitivity and 88.9% specificity to predict the response to TPF treatment in our patients. In addition, four more TPF responsive and four more TPF non-sensitive patient samples were used for further validation. This SVM model has been proven to achieve approximately 75.0% sensitivity and 100% specificity to predict TPF response in new patients. This suggests that our 10-genes SVM prediction model has the potential to assist clinicians to personalize treatment for HNSCC patients.

clinical significance to predict therapy response 8 . For example, low expression of the epidermal growth factor receptor (EGFR) is a potential biomarker in predicting chemotherapy response in HNSCC [8][9][10][11] . However, lower expression of EGFR failed to correlate with better survival rates in response to combined cisplatin and EGF inhibitor cetuximab 12 . Although lower CCND1 was an independent predictor of chemotherapy response and survival in induction groups 8,13 multivariate analysis indicated that CCND1 provides no prognostic or predictive benefit in oral or oropharyngeal SCC 14 .
Currently, a single predictor marker is not sufficiently effective to predict chemotherapy response. Moreover, combined therapy is more popular and effective than single drug treatments. For instance, the combination of cisplatin and 5-fluoruracil (PF) induction chemotherapy has proven to be a successful treatment for HNSCC. However, with the increase of relevant clinical studies, TPF treatment has been more widely utilized with patients, since TPF has better survival and organ preservation rates compared to PF therapy in locally advanced laryngeal, oro-and hypopharyngeal cancer 15,16 . Even though some biomarkers have been reported to predict the response to induction chemotherapy, microarray-based gene expression profiles for predicting TPF chemotherapy have not yet been well investigated. In addition, there is currently no prediction model for TPF treatment.
The prediction of chemotherapy response, with a set of genes, has been widely used in breast cancer. Twenty-one gene expression levels were utilized as a guideline for adjuvant chemotherapy in hormone-receptorpositive, HER2-negative, and axillary node-negative breast cancer [15][16][17] . This breast cancer model constitutes a good example of personalized medicine, and has also saved at least USD 300 million of unnecessary chemotherapy costs in the U.S. Here, we used microarray-based gene expression profiles to identify the gene signature that related to TPF response. We first identified a group of genes, which could predict chemotherapy response in HNSCC, in a SVM model. We further validated this gene signature with more patient samples. Our study provides a set of potential biomarkers to predict patient response to TPF treatment, as well as the possible benefits of: 1) avoiding toxic effects of ineffective chemotherapy; 2) avoiding delays for other therapeutic options; and 3) minimizing the cost of treatment 18 .

Results
Patients' clinical characteristics. Here, a total of 29 patients were enrolled in this study. Among them, 16 patients were considered sensitive to TPF treatment, and 13 patients as non-sensitive to TPF treatment. We classified the sensitive and non-sensitive groups by patients' response to chemotherapy. If tumor volume decreased approximately 70% after chemotherapy, we considered it as a chemotherapy-sensitive one. If tumor volume decreased less than approximately 25%, we considered it as a chemotherapy non-sensitive one. The decreased volume between 25-75% was excluded in our study. Then, we randomly chose the sensitive and non-sensitive samples for our study. The first 21 samples (twelve sensitive and nine non-sensitive) were used to set up the prediction model for TPF treatment (Table 1), and the second group (four sensitive and four non-sensitive) was used to validate our prediction model (Table 1). Here, we excluded some factors that have prediction potential for TPF treatment, i.e., gender, age, primary site, stage, differentiation degree, gastrointestinal reaction, myelosuppression, family history, smoking history, and alcohol intake ( Table 2).
Gene expression profiles. Firstly, a microarray was performed for the first group patients' samples. The data were analyzed by a standard approach 19 . The second group was used as a validation for our previous experiment, and was analyzed in the same way.

A prediction model for TPF treatment in HNSCC.
Our hypothesis is that the mRNA expression profiles of the treatment-sensitive patients are distinct from those of the treatment non-sensitive patients. To prove this, we firstly used the limma package to select a set of differential expression genes (p value ≤ 0.01); and then generated the heatmap figure based on these genes ( Fig. 1). All of the samples were clustered into two groups, i.e., the treatment-sensitive and treatment non-sensitive group. Based on the expression levels, the genes were also clustered into two groups. A set of genes was highly expressed in the treatment-sensitive samples, and another set of genes was highly expressed in the treatment non-sensitive samples. These results motivated us to build a model that can predict the response of the patient to the treatment based on his or her mRNA expression levels.
As described in the Methods section, we trained a SVM model and obtained 722 genes. To avoid the false-positive genes, based on our clinical knowledge, we selected 10 genes that are known to relate to TPF or be potentially involved in the treatment pathway. These genes are GATS, PRIC285, ARID3B, ASNS, CXCR1, FBN2, INMT, MYOM3, SLC27A5, and STC2. The gene list is found in Table 3. We estimated the performance of the model using the one-leave-out method based on the old data. The sensitivity and specificity of this model are 88.3% and 88.9%, respectively (Fig. 2).
Further validation with eight more patients' samples. We trained a 10-genes prediction model with 88.3% sensitivity and 88.9% specificity. Then, we aimed to determine if this model could be validated by more samples. We further checked the gene expression with eight more patients' samples by microarray. Among them, four samples were sensitive and four samples were non-sensitive samples. We used our model to test those samples and found that our model has good performance. For the new data, we determined the sensitivity and specificity of our model as 75.0% and 100%, respectively. This suggested that our model is sufficient to predict TPF treatment performance in HNSCC.

Immunohistochemistry (IHC).
In order to confirm the protein levels of the candidate genes in sensitive and non-sensitive patients, we performed IHC for CXCR1 and ARID3B. We found that CXCR1 was upregulated in non-sensitive patients' samples and ARID3B was downregulated in non-sensitive patients' samples (Fig. 3). These findings correlated well with our microarray results.

Discussion
Although TPF treatment is currently the most popular therapy for HNSCC patients, no effective prediction model exists to avoid unnecessary treatment and aid in relieving patients' pain. Here, we identified the gene signature in TPF-sensitive and non-sensitive patients. In order to predict TPF treatment response, we selected 10 genes that exhibit good correlation with treatment. Based on these 10 genes, a model was set up to predict TPF treatment in HNSCC patients. This model had 88.3% sensitivity and 88.9% specificity for prediction. We further validated this model with eight additional samples. Our model had 75.0% sensitivity and 100% specificity in those samples, which indicated that our prediction model is sufficient to predict the response to TPF treatment.
A major problem with the treatment options for HNSCC is that clinicians do not know the treatment response of each patient prior to chemotherapy, which will delay the best choice of treatment for patients. The purpose of personalized medicine is to separate patients into different groups. In this way, based on patients' predicted response or disease risk, medical decisions, practices, interventions, and/or products are optimally tailored to the individual patient. Here, we developed a model to predict chemotherapy response based on a 10-genes model, which may be beneficially utilized for personalized medicine in HNSCC patients.
The 10-genes model is based on the gene expression of GATS, PRIC285, ARID3B, ASNS, CXCR1, FBN2, INMT, MYOM3, SLC27A5, and STC2. Among these 10 genes, some have been reported to be involved in cell cycle, apoptosis, and drug metabolism. For instance, ARID3B belongs to a subfamily of ARID (AT-rich interaction domain) transcription factors. It could increase tumor growth in vivo, inducing expression of genes associated with metastasis and cancer stem cells [20][21][22] . ASNS (asparagine synthetase, ASNS) gene encodes an enzyme that catalyzes glutamine-and  T4aN2M0  IVA  high  3  TPF  PR   2  M  69  retropharyngeal wall  T4aN2M0  IVA  high  2  TPF  PR   3  M  49  aryepiglottic fold  T3N2M0  IVA  moderate  2  TPF  CR   4  M  62  right pyriform sinus  T4aN2M0  IVA  poor  2  TPF  PR   5  M  60  retropharyngeal wall  T4bN2M0  IVA  moderate  2  TPF  ATP-dependent conversion of aspartic acid to asparagine, and its expression is associated with chemotherapy resistance and prognosis in several human cancers [23][24][25] . CXCR1 is a member of the G-protein-coupled receptor family and functions as a receptor for interleukin 8. CXCR1 promotes tumor growth, invasion, inflammation and metastasis, and the knockdown of CXCR1 will enhance the efficacy of chemotherapy [26][27][28][29] . FBN2 is a tumor suppressor in cancer and is frequently silenced by promoter methylation 30,31 . STC2 is an oncogenic gene that promotes cancer metastasis and epithelial-mesenchymal transition in cancers 32,33 , and the induction of oxaliplatin resistance in colorectal cancer 34 . These studies indicate that the genes in our prediction model possess known functions in cancer progression and chemotherapy. Currently, our prediction model has approximately 75.0% sensitivity and 100% specificity for chemotherapy response prediction. Our future work will use more samples to continue validating our prediction model. Our work has the potential to assist clinicians to predict the treatment response of HNSCC patients prior to chemotherapy and enable clinicians to personalize medicine for each patient. Patient treatment. Patients were treated with TPF chemotherapy. The treatment procedure was as described previously [35][36][37] . Briefly, we performed three 21-day cycles of induction chemotherapy with docetaxel (75 mg/m 2 continuous i.v. infusion for more than 3 h on day 1), cisplatin (30 mg/m 2 ), and 5-FU (500 mg/m 2 ) 37 . We chose 29 samples for our studies, i.e., 16 patients that responded well to TPF chemotherapy and 13 patients that ranged from no response to poor response to TPF treatment. These samples were then divided into two groups based on the new Response Evaluation Criteria in Solid Tumors 38 . The first group had 12 sensitive patients and nine non-sensitive patients. The second group had four sensitive patients and four non-sensitive patients. All patients' information is summarized in Table 1 and Supplemental Table 1. The patients' gender, age, primary site, TNM, stage, cycle of chemotherapy, chemotherapy regimen, gastrointestinal reaction, myelosuppression efficacy, family history, precancerosis, gastroesophageal reflux, smoking history, and alcohol intake were included in our study.  Data analysis, gene selection, and model construction. The microarray data from the CEL files were normalized using RMA. We used the genes that were significantly differentially expressed (DEs) in the TPF-sensitive group or in the TPF non-sensitive group as features to construct the prediction model. Because our sample size was small, to robustly estimate our model and pick up the DEs as much as possible, we built an iterative one-leave-out process. We removed one patient from the first batch of data, and inferred the DEs based on the remaining data. The expression levels of these DEs were used to construct the prediction model (SVM). We then predicted the group of leave-out patients. We repeated the two steps until all 21 patients in the first batch had been removed once. So, we  could obtain the average of the sensitivity and specificity based on the first batch of data alone. Here, we used the limma package to find the DEs and the required p-value < 0.01. Then, we collected all DEs from the 21 rounds, and totally, we obtained 722 genes. All genes were listed as Supplemental Table 2. To avoid false-positive genes, based on our clinical knowledge, we selected 10 genes that are known to relate to TPF or be potentially involved in the treatment pathway. The expression levels of the selected genes in the first batch of data were used to train the support vector machine model and predict the group of patients in the second batch of data independently.
Estimation of predictive power for the new data. We used two values to estimate the performance of the predictive model based on the new data: Sensitivity: True Positive/(True Positive + False Negative); Specificity: True Negative/(True Negative + False Positive). Here, "positive" means that the individual patient is sensitive to the treatment; "negative" means that the individual patient is not sensitive to the treatment. With the first-round data, we applied the one-leave-out cross validation method to estimate the performance of the SVM model. For the second-round data, we trained SVM using the first-round data with all 21 samples, and calculated the sensitivity and specificity.

Statistical analysis.
The data are presented as mean values ± standard deviation (SD) and statistically compared between groups using one-way analysis of variance followed by Student's t-test. The significance of the variables was tested using a multivariate Cox's regression model and a logistic regression model. A p-value of <0.05 was considered statistically significant.