Introduction

Head and neck cancer is the seventh most common cancer worldwide, with over 600,000 new cases in 2012 [1]. About 90% of all head and neck cancers are squamous cell carcinomas. Head and neck squamous cell carcinoma is associated with an elevated likelihood of developing second primary malignancy, which is defined as a second malignancy that presents either simultaneously or after the diagnosis of an index tumor. The overall incidence of second primary malignancies in head and neck squamous cell carcinoma patients has been reported to range from 5.6 to 35.9% [2, 3], with an annual incidence ranging from 3.2 to 4% [4]. The common sites of second primary malignancy are the head and neck, lung, and esophagus [5]. Several risk factors for second primary malignancy in head and neck squamous cell carcinoma patients have been reported including the location of index tumor, patient age, cigarette smoking, alcohol consumption, and betel nut chewing [4].

Despite the progress in cancer molecular biology, the exact underlying molecular mechanism of second primary malignancy is still poorly understood. Field cancerization is one of the widely accepted concepts that explains second primary malignancy tumorigenesis [6]. According to this concept, an area of the upper aerodigestive tract is considered as a field that is continuously exposed to a diversity of carcinogens resulting in multiple precancerous genetic changes, even though the mucosal epithelia still retain their normal histologic appearances. These precancerous fields may eventually become malignant leading to multiple second primary malignancy occurrences [7]. Recently, Curtius et al. has reviewed the evolutionary process that results in field creation [8]. In this context, a cancerized field is both enabled by and causes alterations in the tissue microenvironment. Measurements of the cancerized field evolution hold considerable promise as a new class of biomarker for cancer risk.

Second primary malignancies are known to have a major negative impact on head and neck cancer patients. The prognoses of head and neck cancer patients who develop second primary malignancy are worse than those with index malignancies alone [9], and it is also a leading long-term cause of mortality in head and neck cancer patients [10]. Early diagnosis of second primary malignancies often allows less invasive or curative treatments. However, current clinical screening and surveillance methods including panendoscopy and positron emission tomography/computed tomography (PET/CT) scan still have clinical limitations [11, 12]. Panendoscopy is a relatively invasive procedure and can result in serious complications such as esophageal perforation [11]. While PET/CT scan demonstrates high sensitivity for detecting synchronous second primary malignancies, its findings can be false-positive, and may miss small and/or superficial synchronous second primary malignancies [13]. Furthermore, the attempts to use chemoprevention for second primary malignancy risk reduction have so far been unsuccessful [14]. For these reasons, novel screening approaches such as molecular biomarkers for second primary malignancy risk prediction are very much needed.

Advances in omics technology have resulted in a better understanding of head and neck cancer carcinogenesis, which in turn lead to the development of novel molecular biomarkers for this type of cancer that can be successfully translated into routine clinical practice. Human papillomavirus (HPV) has emerged as an important etiologic factor of head and neck cancers, particularly in oropharyngeal cancers. HPV status is currently used as a prognostic biomarker for head and neck cancer; HPV-positive head and neck squamous cell carcinomas have a favorable prognosis and may benefit from less aggressive treatment regimens [15]. The recently released 8th edition of the American Joint Committee on Cancer (AJCC) Staging Manual, Head and Neck Section, has acknowledged the importance of using p16 immunostaining as a surrogate marker of HPV status, and issued a distinct staging classification for HPV-associated oropharyngeal cancer, separating it from cancer of the oropharynx related to other causes [16]. Despite these advances, to date, very few studies have focused on second primary malignancy, particularly on a biomarker discovery aspect [17, 18], and so far none has been successfully translated into clinics.

In this study, we hypothesized that the cancerized fields in head and neck squamous cell carcinoma patients with or without second primary malignancy are diverse, leading to distinct protein or gene expression patterns that can be exploited as biomarkers. We then aimed to identify molecular biomarkers for second primary malignancy risk prediction using the routinely collected formalin-fixed paraffin-embedded tissue biopsies of the index head and neck squamous cell carcinomas. Using liquid chromatography-tandem mass spectrometry, we identified a set of candidate biomarkers differentially expressed in the primary tumors of head and neck squamous cell carcinoma patients with or without second primary malignancy after a minimum follow-up period of 3 years. Next, we utilized a NanoString nCounter gene expression assay to study the ability of selected biomarkers to predict second primary malignancy development and the time to second primary malignancy occurrence in another independent cohort of head and neck squamous cell carcinoma patients. Our study permitted the development of the first tissue-based gene expression biomarker panel for second primary malignancy that can easily be implemented in the routine pathology practice.

Materials and methods

Patient selection and sample description

All patients were clinically diagnosed with head and neck cancer by the otolaryngologists or oncologists and had a histopathological diagnosis of squamous cell carcinoma. The patients with only a single primary tumor after a minimum follow-up period of approximately 3 years were defined as “Pindex group”. The head and neck squamous cell carcinoma patients who developed second primary malignancy during the same follow-up period were defined as “PSPM group”. The clinical diagnosis of second primary malignancy was made based on the applied recommendation from Warren and Gates’ criteria [19] as follows: (i) both index and secondary tumors had histologic confirmation of squamous cell carcinoma; (ii) the two malignancies were anatomically separated by more than 2 cm of normal mucosa; (iii) the possibility of the second primary malignancy being a metastasis from the index tumor was excluded. Second primary malignancy was further classified as synchronous second primary malignancy when the diagnosis of both index tumor and second tumor was made simultaneously or within 6 months after the index tumor, or as metachronous second primary malignancy when the diagnosis of second tumor was made longer than 6 months after the index tumor [20].

Cohort 1 consisted of 16 head and neck squamous cell carcinoma patients from Songklanagarind Hospital (Prince of Songkla University, Songkhla, Thailand). A pair of formalin-fixed paraffin-embedded tissue blocks containing the index tumor and normal mucosal epithelia was collected for each patient, resulting in a total of 32 samples for proteomic analysis. In addition, all second primary malignancy samples were previously confirmed to be second primary malignancies, not metastatic tumors, by comparing the loss of heterozygosity patterns identified by single nucleotide polymorphism array between each index head and neck squamous cell carcinoma and its matched second tumor [21]. Cohort 2 was an independent multi-center cohort comprising 49 head and neck squamous cell carcinoma patients from Ramathibodi Hospital (Mahidol University, Bangkok, Thailand), Siriraj Hospital (Mahidol University, Bangkok, Thailand), and Songklanagarind Hospital. Only formalin-fixed paraffin-embedded tissue blocks containing the index tumor were collected for each patient. This study was approved by the Institutional Review Board of all participating hospitals.

Sample preparation

Formalin-fixed paraffin-embedded tissue sections were prepared. Previously archived hematoxylin and eosin stained tissue slides were evaluated for tumor content and density by pathologists. The areas with at least 70% tumor cells and normal mucosal epithelial cells were marked for manual macrodissection using a needle tip or scalpel. For liquid chromatography-tandem mass spectrometry, dissected tissue was prepared for protein digestion using the filter-assisted sample preparation method [22] and physically disrupted by sonication. For NanoString gene expression assay, total RNA was isolated using High Pure FFPET RNA Isolation Kit (Roche) according to the manufacturer’s instructions.

Liquid chromatography-tandem mass spectrometry

Liquid chromatography-tandem mass spectrometry analysis was performed as a service by Bioproximity, LLC (Chantilly, VA, USA). In brief, proteins were digested with trypsin, and digested peptides were desalted using C18 stop-and-go extraction (STAGE) tips [23]. Peptides were then fractionated by strong anion exchange STAGE tip chromatography [24]. Liquid chromatography was performed on an Easy nanoLC II HPLC system (Thermo Fisher Scientific). The liquid chromatography was interfaced to a dual pressure linear ion trap mass spectrometer (LTQ Velos, Thermo Fisher Scientific) via nano-electrospray ionization. Data from mass spectrometry were processed and the Mascot generic format (MGF) files were searched using X!!Tandem, k-score scoring algorithms and OMSSA (open mass spectrometry search algorithm). The common Repository of Adventitious Proteins and the Ensembl release 69 were used as protein databases for the searches. Proteins were required to have two or more unique peptides across the analyzed samples with E-value scores of 0.01 or less and protein E-value scores of 0.0001 or less [25]. The spectral count of each identified protein was initially scaled by the total and/or maximum peptide counts of the same patient. All scaled spectral counts were added with 0.01 to avoid division by zero, and each scaled spectral count of tumor tissue was then divided by the scaled spectral count of the corresponding normal tissue from the same patient to normalize interindividual variations.

Immunohistochemistry

A Leica BOND‐MAX™ automated staining system (Leica Biosystems) was utilized according to the manufacturer’s protocol for anti-CALML3 (1:1000, PA5-30232, Thermo Fisher Scientific), and anti-CKMT2 (1:200, PA5-28591, Thermo Fisher Scientific). A Ventana BenchMark XT automated staining system (Ventana Medical Systems) was used according to the manufacturer’s instructions for anti-DSG3 (1:50, MAB1720, R&D Systems), anti-PLOD1 (1:50, NBP2-31885, Novus Biologicals), and anti-p16 (CINtec® p16 Histology, 705-4713, Ventana Medical Systems). Blinded scoring was independently performed by two pathologists using the same scoring criteria (Supplementary Table S1).

Digital gene expression quantification using NanoString nCounter®

A NanoString nCounter® Gene Expression Assay (NanoString Technologies) with the custom-designed CodeSets containing 40 genes was performed following the standard protocol. Data processing was conducted using nSolver Analysis Software v3.0 (NanoString Technologies). Background correction was made by subtracting the “mean + 2 standard deviation” value of the negative controls from the raw counts. Adjusted raw counts were then normalized with the geometric mean of the positive controls in each sample. The resulting data was normalized again using the geometric mean of three housekeeping genes. Normalized data was standardized using mean centering then divided by the standard deviation to obtain the Z scores for subsequent statistical analyses.

Bioinformatics and statistical analysis

Hierarchical cluster analysis using average linkage method was performed using Cluster 3.0 (http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm), and visualized by Java TreeView version 1.1.6r4 (http://jtreeview.sourceforge.net). Functional annotation of candidate proteins was performed using DAVID 6.8 (https://david.ncifcrf.gov/). Statistical analyses were performed using PASW Statistics version 18.0 for Windows (SPSS Inc.), GraphPad Prism version 6.0 for Windows (GraphPad Software), R-statistical software version 3.3.1 (R Foundation, http://www.r-project.org), and ROCR package [26].

Results

Subject characteristics

Two independent cohorts of head and neck squamous cell carcinoma patients were included in this study. Among the 16 patients in cohort 1, 5 (31%) and 11 (69%) patients belonged to the Pindex and PSPM groups, respectively. The median duration of follow-up to ensure no occurrence of second primary malignancy was 31 (30–86) months in the Pindex group. All second primary malignancies were esophageal squamous cell carcinomas. No statistically significant difference was detected between the clinical characteristics of patients in the Pindex and PSPM groups.

In cohort 2, of the 49 head and neck squamous cell carcinoma patients examined, 21 (43%) and 28 (57%) patients belonged to the Pindex and PSPM groups, respectively. The median duration of follow-up was 54 (35–95) months in the Pindex group. All second primary malignancies were squamous cell carcinomas located in the esophagus, head and neck, or lung. The patients’ age was the only clinical characteristic with statistically significant difference between the Pindex and PSPM groups (P= 0.027). The detailed clinicopathological and demographical characteristics of patients in both cohorts are summarized in Table 1.

Table 1 Clinical characteristics of patients

The HPV status of head and neck tumors was also evaluated using p16 immunohistochemistry. The difference in HPV status based on p16-positivity between the Pindex and PSPM groups from the two cohorts was not statistically significant (Fisher’s exact test P= 1.000). Of note, only three cases out of the 49 head and neck squamous cell carcinomas in cohort 2 were p16-positive. The p16 immunostaining results are shown in Supplementary Table S2.

Liquid chromatography-tandem mass spectrometry analysis and validation by immunohistochemistry

The detailed experimental outline of this study is summarized in Fig. 1. We first performed liquid chromatography-tandem mass spectrometry in 16 pairs of tumor-normal formalin-fixed paraffin-embedded tissues from Pindex and PSPM groups (cohort 1) as a means to discover candidate biomarkers for second primary malignancy. A total of 2816 proteins were initially identified. Proteins detected in <10% of all samples were then filtered out, leaving a total of 2101 proteins for unsupervised hierarchical cluster analysis. The resulted dendrogram showed that these protein profiles could correctly classify the tumor and normal tissue samples (Fig. 2a). We further investigated the protein profiles within the tumor or normal tissue subgroups separately whether they could correctly define each sample as Pindex or PSPM by performing hierarchical cluster analysis of differentially expressed proteins (t test P < 0.05). The results interestingly demonstrated that Pindex and PSPM samples could be accurately clustered in both tumor and normal tissue subgroups (Supplementary Figure S1).

Fig. 1
figure 1

Overview and experimental design of the study. The biomarker identification steps were sequentially conducted using liquid chromatography-tandem mass spectrometry in cohort 1 (left), and NanoString gene expression assay in cohort 2 (right). Multiple statistical analyses were performed to identify prognostic biomarkers for second primary malignancy occurrence and time to second primary malignancy development. SPM second primary malignancy, Pindex index head and neck squamous cell carcinoma without second primary malignancy, PSPM index head and neck squamous cell carcinoma with second primary malignancy, FFPE formalin-fixed paraffin-embedded, LC-MS/MS liquid chromatography-tandem mass spectrometry, IHC immunohistochemistry, ROC receiver operating characteristic

Fig. 2
figure 2

Protein expression profiling by liquid chromatography-tandem mass spectrometry and validation by immunohistochemistry. a Dendrogram of unsupervised hierarchical cluster analysis of the 2101 protein profiles across the tumor samples and their matched normal mucosal tissues in cohort 1. b Representative immunohistochemistry images (×10 magnification) of the four selected proteins (CKMT2, DSG3, CALML3, and PLOD1). N: normal mucosal epithelial cells, T: tumor cells. c Box plot showing the comparison between normalized expression ratios (T/N) of the four selected proteins analyzed by immunohistochemistry and liquid chromatography-tandem mass spectrometry. IHC immunohistochemistry, LC-MS/MS liquid chromatography-tandem mass spectrometry

In addition, immunohistochemistry analysis of representative proteins was performed to validate the accuracy of liquid chromatography-tandem mass spectrometry results. Four proteins that were at least two-fold up- (PLOD1) or downregulated (CKMT2, DSG3, CALML3) in tumors as compared to normal tissues, and expressed in more than half of the samples, were selected. The results demonstrated that immunohistochemistry analysis showed the same up- or downregulated trend between normal and tumor tissues as in the proteomics data in three of four proteins (DSG3, CALML3, PLOD1) (Fig. 2b–c).

Identification of candidate proteins biomarkers for second primary malignancy

To screen for the differentially expressed proteins between head and neck squamous cell carcinoma patients with and without second primary malignancy, the protein expression ratio of tumor to matched normal tissue of each patient was analyzed. The inclusion criteria for candidate proteins included: (i) proteins expressed in more than 50% of the samples, (ii) proteins showing at least three-fold difference in expression level between the Pindex and PSPM groups, and (iii) proteins exhibiting statistically significant differential expression between the two groups (t test P< 0.05). Based on these criteria, 32 candidate proteins were identified (Supplementary Table S3). Hierarchical cluster analysis using expression ratio of these proteins demonstrated that all cohort 1 samples could be correctly divided into Pindex and PSPM groups (Fig. 3).

Fig. 3
figure 3

Hierarchical cluster analysis of the candidate proteins. A heatmap showing the 32 candidate proteins differentially expressed between Pindex and PSPM subgroups with statistical significance (t test P < 0.05) across cohort 1 (left), and the summary of gene-annotation enrichment analysis of up- and downregulated proteins of each subgroup using DAVID bioinformatics resources 6.8 (right). The color scale is shown at the upper left corner. Pindex: index head and neck squamous cell carcinoma without second primary malignancy, PSPM: index head and neck squamous cell carcinoma with second primary malignancy. *Statistically significant (P < 0.05), **statistically significant (P < 0.01)

We further carried out gene-annotation enrichment analysis using DAVID 6.8 to identify enriched biological functions of the candidate proteins. Among the upregulated proteins in PSPM samples, the statistically significant gene ontology term (P < 0.001) with highest enrichment score (3.88) was “intermediate filament” (KRT37, KRT12, KRT86, KRT24). The other significant gene ontology terms (P < 0.05) were “cytoplasm” (UBE2N, PA2G4, S100A7, ARPC2, SERPINB5, TRIM29, CRABP2, EIF5A, RPS20, AHNAK, KRT24), and “poly(A) RNA binding” (UBE2N, PA2G4, EIF5A, RPS20, AHNAK). In contrast, the significant terms with highest enrichment score (1.31) among the downregulated proteins were “extracellular exosome” and “extracellular matrix” (PTGES3, PLOD1, FBLN2, UGDH, CSRP1, ECM1, EMILIN1) (P < 0.01) (Fig. 3 and Supplementary Table S4A-B).

Development of a predictive model for second primary malignancy occurrence by NanoString nCounter gene expression analysis

Considering the difficulties in translating tissue-based quantitative protein biomarkers into clinical practice, we set out to alternatively utilize a NanoString nCounter gene expression assay to optimize the biomarker selection in an independent cohort (cohort 2). The 40-gene custom panel consisted of the 30 candidate proteins (of the 32 candidates, two did not have specific CodeSets available), three housekeeping genes, and seven frequently mutated genes in head and neck squamous cell carcinomas with synchronous nodal metastasis or metachronous recurrence from literature review [27, 28] (Supplementary Table S5). HPV-positive and HPV-negative head and neck cancers are known to exhibit different clinical and molecular characteristics [16]. Since the number of p16-positive cases in our cohort was limited (3 out of 49 cases), we focused on the 46 p16-negative head and neck squamous cell carcinoma patients for further analyses.

To identify biomarkers that could accurately determine the development of second primary malignancy, univariate logistic regression analysis of the standardized NanoString gene expression levels and selected clinical variables of the 46 p16-negative head and neck squamous cell carcinoma patients was carried out. The expression levels of three genes (ITPR3, FAT1, KMT2D) and the patient’s age at diagnosis were statistically significantly associated with the development of second primary malignancy (P< 0.05) (Fig. 4a). The strongest risk factor for the development of second primary malignancy was high ITPR3 level. An increase in one standard deviation of ITPR3 level was associated with 3.27 times higher risk of second primary malignancy development (odds ratio = 3.27; 95% confidence interval (CI) [1.36–11.04]; P = 0.025). Other significant risk factors included high FAT1 (odds ratio = 2.25; 95% CI [1.13–5.49]; P = 0.038) and KMT2D levels (odds ratio = 2.20; 95% CI [1.10–5.18]; P= 0.043). To minimize the effect of small sample size, we performed 5000 permutation tests to correct for non-asymptotic properties of P-values by reshuffling the observed data. The resulting P-values were reported as empirical P-values. The analysis showed that the expression levels of two more genes (EMILIN1 and ECM1) were also statistically significantly associated with second primary malignancy development (empirical P < 0.05). The results of univariate logistic regression of all genes are shown in Supplementary Table S6. Comparing between Pindex and PSPM groups, the expression levels of ITPR3, FAT1, KMT2D, and ECM1 were higher in PSPM samples, whereas the level of EMILIN1 was lower (Fig. 4b).

Fig. 4
figure 4

Identification of biomarkers for second primary malignancy risk prediction by NanoString gene expression analysis. a The statistically significant variables identified by univariate logistic regression across Pindex and PSPM subgroups of the p16-negative tumors in cohort 2 (n = 46). The empirical P-values were generated by 5000 permutation tests. b Box plot depicting expression levels of the five candidate genes identified by univariate logistic regression across Pindex and PSPM subgroups of the p16-negative tumors in cohort 2. All pairwise comparisons show statistically significant differential expressions. c The predictive performance of the selected multivariate logistic regression model (ITPR3-KMT2D-EMILIN1-patient’s age) in classifying second primary malignancy risk. The robustness of the model was assessed by performing a leave-one-out cross-validation for each sample in the same cohort. The unit of patient’s age is year. Pindex: index head and neck squamous cell carcinoma without second primary malignancy, PSPM: index head and neck squamous cell carcinoma with second primary malignancy. *Statistically significant (P < 0.05), **statistically significant (P < 0.01), CI confidence interval

We next aimed to identify the best combination of biomarkers for second primary malignancy development prediction by analyzing the six statistically significant variables (P < 0.05 or empirical P < 0.05) from univariate logistic regression (ITPR3, FAT1, KMT2D, EMILIN1, ECM1, and patient’s age) using multivariate logistic regression analysis. The most parsimonious final model based on Akaike’s information criterion, Bayesian information criterion, and deviance test was the combination of ITPR3, KMT2D, EMILIN1, and the patient’s age. The Akaike’s information criterion, and Bayesian information criterion values of this model were 50.82 and 59.96, respectively. The detailed results of all multivariate logistic regression models and the final selection are shown in Supplementary Table S7 and S8.

To evaluate the performance of the selected multivariate model in predicting second primary malignancy occurrence, a receiver operating characteristic analysis was performed and an area under the curve value was calculated as 0.86 (95% CI [0.75–0.97]). Moreover, this model has a sensitivity of 88.46%, a specificity of 75.00% and an accuracy of 82.61% for predicting the occurrence of second primary malignancy. We further performed a leave-one-out cross-validation over the same 46 samples (cohort 2) to assess the robustness of our model. The resulting area under the curve, sensitivity, specificity and accuracy values were 0.80 (95% CI [0.66–0.94]), 76.92%, 70.00%, and 73.91%, respectively (Fig. 4c). These new values were not significantly different from the original results, confirming the validity of our predictive model.

Identification of biomarkers for predicting time to second primary malignancy development

Next, we focused on head and neck squamous cell carcinoma patients who eventually developed second primary malignancy (PSPM group) in order to identify the biomarkers associated with time to subsequent second primary malignancy occurrence. Univariate Cox regression analysis using gene expression levels of the 26 p16-negative head and neck squamous cell carcinomas from cohort 2 was performed. The levels of ITPR3, DBI, AHNAK, IGHV3-49, CALML3, ARPC2, DSG3, and KRT37 were significantly associated with a shorter time to second primary malignancy development (P < 0.05) (Fig. 5a). The strongest association was with ITPR3 level. An increase in one standard deviation of ITPR3 level was associated with 2.68 times higher risk of second primary malignancy development at any given time after the index tumor diagnosis (hazard ratio = 2.68; 95% CI [1.53–4.72]; P= 0.001). The complete results of univariate Cox regression analysis including the empirical P-values generated by 5000 permutation tests are shown in Supplementary Table S9. Using multivariate Cox proportional hazards regression analysis, the best model was the combination of ITPR3 and DSG3 (Supplementary Table S10). Moreover, by dividing the standardized ITPR3 and DSG3 level by the sample mean of the 26 PSPM patients, survival analysis showed that the patients with p16-negative head and neck squamous cell carcinoma whose index tumors exhibited high ITPR3 and DSG3 expression levels had the shortest time interval between the diagnosis of an index head and neck squamous cell carcinoma to subsequent second primary malignancy development (log-rank test P = 0.017) (median time difference between the high and low risk groups based on four combinations of ITPR3 and DSG3 levels = 394 days) (Fig. 5b and Supplementary Table S11).

Fig. 5
figure 5

Identification of biomarkers for predicting time to second primary malignancy development. a The statistically significant genes identified by univariate Cox regression across PSPM patients with p16-negative tumor in cohort 2 (n = 26). The empirical P-values were generated by 5000 permutation tests. b Kaplan–Meier curves showing the proportions of PSPM patients with p16-negative tumor stratified by standardized ITPR3 and DSG3 expression. High and low levels of ITPR3 and DSG3 expression were defined as standardized gene levels above or below mean, respectively. Patients with high ITPR3 and high DSG3 expression levels had the shortest time to second primary malignancy development (log-rank test P = 0.017). CI confidence interval, *statistically significant (P < 0.05), **statistically significant (P < 0.01)

Comparison of the expression pattern of candidate biomarkers between protein and mRNA levels

It is known that protein and mRNA expression levels often do not directly correlate [29]. To investigate whether our candidate genes originally discovered by proteomics study shared the same differential expression patterns as their protein equivalents, we compared the standardized gene expressions of the top 10 statistically significant genes identified by logistic regression and Cox regression analyses to their standardized protein levels obtained from liquid chromatography-tandem mass spectrometry. Seven of the 10 biomarkers (AHNAK, ARPC2, CALML3, DBI, DSG3, EMILIN1, KRT12) demonstrated a similar up- or downregulation trend between mRNA and protein levels in the tumor samples across the two patient cohorts (Fig. 6).

Fig. 6
figure 6

Comparison of the expression patterns between protein and mRNA levels of the candidate biomarkers across the Pindex and PSPM subgroups. The standardized NanoString gene expressions of the top 10 statistically significant genes from cohort 2 were compared to their standardized protein levels in the tumor samples of cohort 1 as quantified by liquid chromatography-tandem mass spectrometry. Color codes represent the standardized levels of protein and mRNA expression from low (blue) to high (red)

Discussion

In this study, we identified a set of tissue-based biomarkers for predicting second primary malignancy occurrence and time to second primary malignancy development in head and neck squamous cell carcinoma patients using a combination of high-throughput shotgun proteomics and targeted gene expression analysis. These second primary malignancy risk prediction biomarkers can help guide clinical management of head and neck squamous cell carcinoma patients, particularly in the frequency of second primary malignancy surveillance after the diagnosis of an index tumor, and the choice of treatments. The expected long-term benefit is an improvement in overall survival of head and neck cancer patients, especially those who are eligible for curative or less invasive therapy. Moreover, one of the clinical strengths of this study is the use of index head and neck squamous cell carcinoma formalin-fixed paraffin-embedded tissue as the preferred material for biomarker discovery. This specimen is routinely acquired for histopathological diagnosis, which makes it easier to incorporate our biomarker panel into the current clinical practice. We believe that our study is the first to identify prognostic biomarkers that can accurately predict second primary malignancy occurrence, thus opening the door to the possibility of clinical application of tissue-based biomarkers for second primary malignancy in head and neck squamous cell carcinoma patients.

Mass spectrometry has been extensively used to discover novel protein biomarkers [30]. Proteins are the functional molecules in the cell, and thus are the key players that represent actual cellular physiology. For tissue-based protein biomarkers, several laboratory methods such as immunohistochemistry and targeted mass spectrometry can be used to develop the validated markers into quantitative clinical assays but not without limitations. To date, immunohistochemistry is routinely performed in clinics to determine both the qualitative and semi-quantitative aspects of a protein biomarker, however, it relies heavily on the quality of antibody and still lacks interpretation standardization, resulting in poor reproducibility [31]. While targeted mass spectrometry can do marker multiplexing and does not rely on antibody, it is technically complex and carry a high developing cost [32]. The development of tissue-based high-throughput gene expression assays has greatly improved the accuracy and reproducibility of quantitative measurement in tissue biomarker studies. Currently, several such assays have been integrated into clinical practice. For example, the Prosigna breast cancer prognostic gene signature assay based on NanoString nCounter analysis system was approved by U.S. Food and Drug Administration in 2013 as a prognostic assay for distant recurrence risk in breast cancer patients [33]. In this study, we sequentially utilized liquid chromatography-tandem mass spectrometry and NanoString nCounter system in two independent cohorts of head and neck squamous cell carcinoma patients in order to identify the best set of second primary malignancy prediction biomarkers. In light of its clinical translatability, technical reproducibility, and compatibility with small biopsied formalin-fixed paraffin-embedded tissues [34], the NanoString digital gene expression platform was chosen for biomarker panel development.

In cohort 1, the protein profiles were markedly different between head and neck squamous cell carcinomas and their matched normal squamous mucosa, which is in line with previous studies [17, 35]. Interestingly, we also observed distinct protein expression profiles between the Pindex and PSPM subgroups when analyzing the tumor and normal tissue samples separately. These results suggest that tissue microenvironments of both tumor and surrounding normal mucosal cells of the Pindex and PSPM patients may exhibit varying levels of genetic diversity, resulting in distinct protein and/or gene expression signatures. These findings strongly support our hypothesis that the aberrant field effect in the biopsied index tumors can be used to predict whether the patients are at risk of subsequent second primary malignancy development. It is proposed that genetic diversity can be utilized as a marker for field evolvability [8]. In a prospective pilot study by Roesch-Ely et al., by analyzing the protein profiles of mucosal biopsies from the oropharynx, hypopharynx, and three different regions of esophagus in head and neck squamous cell carcinoma patients and controls, tumor relapse was correctly predicted [17]. Taken together, these findings firmly established an altered field as a promising cancer risk prediction marker.

Of note, among the candidate proteins differentially expressed between Pindex and PSPM samples, upregulated proteins in PSPM group were significantly enriched with proteins associated with intermediate filament, whereas those involved with extracellular matrix were downregulated. Overexpression of keratins, the intermediate filament-forming proteins of epithelial cells, is associated with enhanced tumor cell migration and invasion through interactions with extracellular environment [36]. Extracellular matrix proteins are associated with both tumor suppression and tumor promotion. Interestingly, the majority of downregulated extracellular matrix proteins in PSPM samples including ECM1, EMILIN1, and FBLN2 have demonstrated tumor suppressive roles in various cancers [37,38,39]. Based on our findings, aberrant cytoskeletal activity and altered cell motility as a result of reduced expression of extracellular matrix-associated tumor suppressors are among the major molecular mechanisms underlying second primary malignancy development.

Recently, a large head and neck squamous cell carcinoma genomics study from The Cancer Genome Atlas (TCGA) consisting of mostly Caucasian patients revealed that 64% and 6% of oropharyngeal and non-oropharyngeal tumors, respectively, are HPV-positive [28]. In contrast, studies from Thailand reported that HPV status was positive in only 26.09% of oropharyngeal cancers and in none of the 80 non-oropharyngeal cancers tested [40, 41]. This fact is also reflected in our largely p16-negative cohorts. In addition, multiple studies have reported that patients with HPV-negative oropharyngeal squamous cell carcinoma have a higher risk of second primary malignancy development than HPV-positive oropharyngeal squamous cell carcinoma patients [42, 43]. Since most of our patients were p16-negative and HPV-negative head and neck squamous cell carcinoma patients are more likely to have second primary malignancy, we decided to focus on the p16-negative cases in cohort 2 for our gene expression analyses.

For cohort 2, in addition to the candidate genes from our proteomics study, seven genes from recent genomics literatures [27, 28] were included in the gene expression panel. TP53 and NOTCH1 are among the most commonly mutated genes in head and neck squamous cell carcinoma [28]. A recent whole exome sequencing study of head and neck squamous cell carcinomas has reported that C17orf104 and ITPR3 are specifically mutated in synchronous nodal metastases but not in the primary cancers, while DDR2 is exclusively mutated in metachronous recurrent tumors [27]. Additionally, FAT1 and KMT2D are found to be mutated in both primary head and neck squamous cell carcinomas and their nodal metastases [27]. Hence, it is of interest to investigate whether their expressions are associated with second primary malignancy risk. In this study, the combination of three genes (ITPR3, KMT2D, EMILIN1) and patient’s age allowed for the most accurate discrimination between Pindex and PSPM groups with a sensitivity of 88.46% and a specificity of 75.00%. Of note, while these numbers will require validation in the future studies using larger sample size, our panel showed a comparable level of performance to that of the currently used clinical gene expression tests in breast cancer including Oncotype Dx (sensitivity 71–85%, specificity 55–66% for high and intermediate risk groups) and MammaPrint (sensitivity 83–92%, specificity 41–59%) [44].

In addition to its implications in head and neck squamous cell carcinoma [27], increased expression of ITPR3, a major intracellular Ca2+ release channel, is associated with enhanced tumor proliferation and invasion in breast and colorectal cancers [45, 46]. KMT2D, which encodes a histone H3 lysine 4 methyltransferase, is one of the most commonly mutated genes in cancers [47]. High KMT2D expression has recently been reported to promote tumor progression by inducing epithelial-mesenchymal transition (EMT), and is a predictor of poor prognosis in esophageal squamous cell carcinoma [48]. In contrast, EMILIN1, which encodes an extracellular matrix glycoprotein associated with the development of elastic tissues, plays a suppressive role in tumor growth, tumor lymphatic vessel formation, and metastatic spread to lymph nodes [38]. A recent report has suggested that α9β1 integrin, of which EMILIN1 is a ligand, is the major integrin involved in regulating head and neck squamous cell carcinoma cell migration on extracellular matrix [49]. In our study, increased ITPR3 and KMT2D levels, as well as reduced EMILIN1 are significantly associated with second primary malignancy in head and neck squamous cell carcinoma. Their exact roles in second primary malignancy tumorigenesis remain to be explored.

Second primary malignancy can be classified into either a synchronous or metachronous tumor. Since survival outcomes between synchronous and metachronous second primary malignancies are different [5, 9], it is clinically useful to be able to predict the time to second tumor development. The combination of two genes, ITPR3 and DSG3, was statistically significantly associated with the time to second primary malignancy occurrence in our study. DSG3, a component of cell–cell junctions, is overexpressed in head and neck squamous cell carcinomas, and its inhibition significantly suppresses tumor growth [50]. DSG3 has also been proposed as a predictive biomarker for cervical lymph node micrometastasis in oral cancer [51]. In line with their roles in promoting tumor proliferation and progression, high expression of both ITPR3 and DSG3 contributed to the shortest time to second primary malignancy development in patients with p16-negative head and neck squamous cell carcinoma.

It is generally known that transcript levels by themselves are by no means an accurate predicting factor for protein levels in many scenarios [29]. In this study, the majority of top candidate biomarkers remarkably showed a similar pattern of differential protein and mRNA expressions in the tumor samples across the Pindex and PSPM subgroups of both cohorts. This finding partly explains a smooth transition from protein to transcript level of this particular set of markers in regard to second primary malignancy risk prediction, and may also render future biomarker assay development the flexibility and feasibility to use either gene expression or targeted proteomic platforms.

According to the current follow-up recommendation by the National Comprehensive Cancer Network (NCCN) Guidelines for Head and Neck Cancers (version 2.2018) [52], history and physical exams including a complete head and neck exam; and mirror and fiberoptic examination should be performed at least every 3 months during the first year, and at least every 6 months in the second year. Based on our results, all patients with high ITPR3 and DSG3 levels developed second primary malignancy within ~13 months (400 days) after the diagnosis of an index tumor. Therefore, it may be beneficial to provide closer monitoring of second primary malignancy for this group of patients during their first years. Moreover, even though some synchronous second primary malignancies develop around the same time as the index tumor, they often cannot be clinically detected until their sizes are big enough. The use of our biomarkers should help increase early diagnosis of both subclinical synchronous second primary malignancies and subsequent metachronous second primary malignancies through a more rigorous surveillance protocol.

Limitations of this study include the relatively small sample size and limited follow-up time to second primary malignancy occurrence. Others include the use of a cohort of convenience where adequate formalin-fixed paraffin-embedded tissues was available, and the use of one tissue sample per tumor, which may not represent intratumor heterogeneity.

In summary, our biomarkers demonstrate great potential as a companion prognostic test for second primary malignancy risk prediction in routine clinical practice. The expected long-term benefit of early detection of second primary malignancy is an improvement in overall survival of head and neck cancer patients, especially those who are eligible for curative or less invasive therapy. The results are compelling and warrant future validation studies in larger head and neck squamous cell carcinoma cohorts. The protein signatures also hint at tumor-extracellular matrix interactions as a major player in second primary malignancy tumorigenesis. Further functional studies of these biomarkers may better clarify the clinical utility and their roles in second primary malignancy development.