Introduction

With over 500,000 new cases per year worldwide, head and neck squamous cell carcinoma (HNSCC) develops from multiple anatomic subsites, including oral cavity, hypopharynx, oropharynx, larynx and nasopharynx1,2. Main risk factors include tobacco exposure, together with alcohol use, as well as infection with potentially oncogenic viruses, such as HPV-16, HPV-18, HPV-31, and HPV-331,3,4. Moreover, the increasing (ab)use of so called ‘e-liquids’, as a more ‘healthy’ way of smoking, might need to be considered as an additional risk factor. However, conclusive studies are missing so far. When discovered in early-stage, HNSCC are treated with surgery and/or radiotherapy; for advanced stages treatment is multimodal1. Surgery is used when possible, usually combined with adjuvant chemoradiation, or definitive chemoradiation1. For recurrent and metastatic diseases, chemotherapy is applied with the option of alternative immunotherapy when chemotherapy is not tolerated1. All treatment is indicated mainly by primary site, TNM classification and other pathologic features limited by performance status1,5. Cancer therapies are changing gradually, shifting from systemic cytotoxic drugs to individual approaches. Essential for these personalized therapies are diagnostic tests predicting the clinical outcome of certain therapies using the vast knowledge of cancer origins and maintenance obtained by molecular biology6.

As the accumulation of mutations in tumours from patients with lymph node metastasis is not increased compared to patients without lymph node involvement7, other alterations have been suggested8. Expression profiling has been used to distinguish between cancer subtypes and predict disease outcome in different types of cancer9,10,11,12,13. Previous studies allowed the categorization of HNSCC tumours into distinct subtypes14, discovered a gene expression signature associated with recurrent disease15, or suggested genes with diagnostic or prognostic potential16,17,18,19,20,21,22,23. Microarray analysis performed on normal oral epithelium compared to HNSCC elucidated the roles of microenvironment remodelling and immune responses in survival of HNSCC24. The only prognostic marker for HNSCC is the HPV status of the tumour, and predictive biomarkers to influence treatment selection are completely lacking1,25. Although HPV positive tumours differ significantly in their expression profiles, treatment options do not depend on the HPV status of the tumour1. Several new therapies targeting growth factor receptors or downstream signalling pathways are currently tested in clinical trials26. Without matching companion diagnostics, physicians will soon be challenged in choosing the most suitable therapy for each patient.

In this study, we analysed differentially expressed genes of HNSCC primary tumours compared to lymph node metastases in order to explore new biomarker candidates and involved signalling pathways. In contrast to other studies concentrating only on the analysis of primary tumours, we additionally focused on identifying the changes that lead to the dissemination of the primary tumour to lymph node metastasis. Gene expression microarray technology was applied to a group of 15 HNSCC tumours, their corresponding normal tissue mucosa and lymph node metastases resected by surgery. To facilitate data interpretation, statistical gene-set-enrichment methods and Qiagen’s software Ingenuity (Pathway Analysis, Comparison and Upstream Analysis, Biomarker Discovery) were used to intersect expressed genes with sets of genes associated with particular biological functions or pathways (as reviewed in27). We focused on obtaining a global view of differential gene expression in HNSCC compared to normal tissue mucosa, the primary tumour and corresponding lymph node metastases. The identification of characteristic gene expression signatures may potentially be exploited for diagnostic, prognostic or therapeutic purposes and promotes the reliability of the corresponding method.

Results

Validation of the gene expression data generated by oligonucleotide chip-arrays

In order to analyse gene expression, RNA from primary tumours (PT), lymph node metastases (M), and corresponding normal tissue mucosa (N) was subjected to microarray analysis using Affymetrix HG-U133A chips. The verification of the obtained data quality by independent experimental approaches is a prerequisite for further analytical approaches in all ‘omics’ technologies. Accordingly, we confirmed our chip-array data by quantitative real-time PCR (qRT-PCR) and reverse transcription PCR (RT-PCR) for selected representative genes and samples. RT-PCR was performed to analyse expression patterns of randomly chosen CLCA4, FN1, POSTN, KRT24 and PRR4 in primary HNSCC vs. their corresponding normal tissue mucosa (PTvs.N, Fig. S1A) and ARPGAP25, FAIM3, RASGRP2, LYPD3, SERPINB4 and TP73L in lymph node metastasis vs. their corresponding primary HNSCC (Mvs.PT, Fig. S1B). FN1, POSTN and OLR1 genes were confirmed as up-regulated, KRT24, KRT4 and PRR4 as down-regulated in PTvs.N. ARPGAP25, FAIM3 and RASGRP2 genes were up-regulated, whereas LYPD3, SERPINB4 and TP73L were down-regulated in Mvs.PT. Additionally, quantitative PCR analyses were performed for ARGHAP25 (Fig. S2), SERPINB4 (Fig. S3) and PRR4 (Fig. S4). All results of qRT-PCR analyses were in accordance to the microarray data. In summary, the conducted controls underline the quality and reliability of our obtained GeneChip array data, fulfilling all prerequisites for further bioinformatics exploitation, model building and in depth functional and clinical analysis.

Identification of significantly upregulated genes in primary tumours and corresponding lymph node metastases

After normalization, we calculated the expression differences over all patients to compare primary tumours to normal mucosa (PTvs.N) and lymph node metastasis to primary tumours (Mvs.PT). By sorting the resulting lists using stringent filter criteria (first by p-value, second by log2FC), we obtained four TOP 30 lists for highly upregulated PTvs.N (Table S4), downregulated PTvs.N (Table S5), upregulated Mvs.PT (Table S6) and downregulated Mvs.PT (Table S7).

The 45 data sets were subjected to supervised (adjusted to the origin of the primary site) average-linkage hierarchical clustering. Normal tissue mucosa and primary tumours grouped separately on a clustering dendrogram (Fig. 1A) as well as normal tissue mucosa and metastasis (data not shown). Comparison of primary tumours vs. metastasis showed only marginal differences, displaying a rather gradual transition (Fig. 2A). As expected, these results show that metastasis and primary tumour were closely related and could not be clearly distinguished from each other.

Figure 1
figure 1

Gene set cluster analysis of 15 HNSCC samples and their 15 corresponding normal mucosa allows profiling of disease progression by identifying a condensed list of 38 most significant genes. Supervised two-way hierarchical clustering and gene tree representation adjusted to the primary site of all genes (A) and differentially regulated genes (B) allow separating primary tumours and normal mucosa. X-axis’ represent patient samples; y-axis’ represent the list of probe sets grouped by similarity using Pearson correlation. Intensities recorded for genes are colour coded: Green – low expression, Red – high expression, Black – medium expression. Orange bar: primary tumour samples; pink bar: normal mucosa samples. (A) Heatmap of all probesets shows the comparison of primary tumours and normal tissue mucosa (PTvs.N). (B) Stringent filter criteria (log2FC ≥ 1.5 and p-value < 0.0001, corresponding to a FDR of 0.003) were used to reduce the probesets to the most significant for PTvs.N, leading to a condensed list of 38 genes able to distinguish diseased from non-malignant tissue. Signature genes are listed separately in Table S8.

Figure 2
figure 2

Gene set cluster analysis of 15 lymph node metastasis and their 15 corresponding primary tumours allows profiling of disease progression by identifying a condensed list of 62 most significant genes. Supervised two-way hierarchical clustering and gene tree representation adjusted to the primary site of all genes (A) and differentially regulated genes (B) allow distinguishing lymph node metastasis and primary tumours. X-axis’ represent patient samples; y-axis’ represent the list of probe sets grouped by similarity using Pearson correlation. Intensities recorded for genes are colour coded: Green – low expression, Red – high expression, Black – medium expression. Orange bar: primary tumour samples; blue bar: lymph node metastasis samples. (A) Heatmap of all probesets shows the comparison of lymph node metastasis to primary tumours (Mvs.PT). (B) Stringent filter criteria (log2FC ≥ 1.5 and p-value < 0.0001, corresponding to a FDR of 0.004) were used to reduce the probesets for Mvs.PT, leading to a condensed list of 62 genes most significantly changed in primary tumours to their corresponding lymph node metastasis. Signature genes are listed separately in Table S9.

In order to pinpoint the most important upregulated genes for either PTvs.N or Mvs.PT, we restricted the dataset to a p-value of lower than 0.0001 (corresponds to a FDR of 0.003 in PTvs.N and 0.004 in Mvs.PT) and a log2 fold change (FC) higher than 1.5 (equates to a FC of 2.83). This data restriction resulted in a signature gene set of 38 upregulated genes for PTvs.N (Fig. 1B and Table S8), and 62 upregulated genes for Mvs.PT respectively (Fig. 2B and Table S9).

Several proteins involved in the (re-)organization and assembly of the extracellular matrix were significantly overexpressed in all primary tumours compared to normal tissue mucosa (Table S8): Matrix Metallopeptidases, Collagens, Asporin, Fibronectin, Laminin, Lysyl Oxidases, Microfibrillar-Associated Protein, Periostin, Serpin Inhibitors, Osteonectin and Osteopontin. Higher expression of genes like KIF23, UBE2C, MCM7, NUP107, RSRC1, MAD2, BTAK, RFC5, ECT2 and CENPA was also found in metastatic tumours reports from other studies14. The latter 5 genes were also present in the distant recurrence predictor of van’t Veer et al. for breast cancer28. Regarding the comparison of Mvs.PT (Table S9), we identified upregulation of several genes related to the immune response, cellular metabolism, several membrane proteins, as well as genes interacting with or regulating GTPases.

Ingenuity Analysis of significantly up-/downregulated genes

Previously reported correlations were investigated thoroughly using the Ingenuity Pathway Analysis tool to reinforce the validity of our analysis. We applied the same restrictions as before, but included downregulated genes (p-value of lower than 0.0001 and higher than a log2FC of |1.5|) to our Ingenuity analysis. Comparing gene expression profiles of PTvs.N, we identified several disease/function networks for cancer and connective tissue disorders with a score higher than 20 (Fig. 3 and Table S10). For the dataset Mvs.PT, top networks included one network for immune response, as well as several networks for cell movement, cell-to-cell signalling and interaction with a score higher than 20 (Fig. 4 and Table S11). All top networks are illustrated in detail in Supplementary Figures S12S27.

Figure 3
figure 3

Top networks of differentially expressed genes in PTvs.N as identified by Ingenuity Pathway Analysis show strong association to cancer and connective tissue disorders. For network analyses a threshold of log2FC > |1.5| with a p-value < 0.0001 (corresponding to a FDR of 0.003) was used, giving an overview of involved genes within the networks and their differential expression. Node colour indicates up-regulated (red), down-regulated (green) or not significantly changed according to threshold (grey) genes. Lines and arrows between nodes represent direct (solid lines) and indirect (dashed lines) interactions between molecules. All edges are supported by at least one reference from the literature or from canonical information stored in the Ingenuity Knowledge Base. Node shapes represent functional classes of gene products: square → cytokine, triangle → kinase, rectangle → nuclear receptor, concentric circle → group or complex, vertical diamond → enzyme, horizontal diamond → peptidase, trapezium → transporter, vertical ellipse → transmembrane receptor horizontal ellipse → transcription regulator, circle → other. (A) Top 01 significant PTvs.N, network (Cellular Development, Infectious Disease, Cancer) contains 27 focus genes. (B) Top 02 significant PTvs.N network (Connective Tissue Disorders, Dermatological Diseases and Conditions, Cellular Assembly and Organization) contains 24 focus genes.

Figure 4
figure 4

Top networks of differentially expressed genes in Mvs.PT as identified by Ingenuity Pathway Analysis show strong association to cancer and connective tissue disorders. For network analyses a threshold of log2FC > |1.5| with a p-value < 0.0001 (corresponding to a FDR of 0.004) was used, giving an overview of involved genes within the networks and their differential expression. Node colour indicates up-regulated (red), down-regulated (green) or not significantly changed according to threshold (grey) genes. Lines and arrows between nodes represent direct (solid lines) and indirect (dashed lines) interactions between molecules. All edges are supported by at least one reference from the literature or from canonical information stored in the Ingenuity Knowledge Base. Node shapes represent functional classes of gene products: square → cytokine, triangle → kinase, rectangle → nuclear receptor, concentric circle → group or complex, vertical diamond → enzyme, horizontal diamond → peptidase, trapezium → transporter, vertical ellipse → transmembrane receptor horizontal ellipse → transcription regulator, circle → other. (A) Top 03 significant Mvs.PT network (Cell-To-Cell Signalling and Interaction, Haematological System Development and Function, Tissue Morphology) contains 14 focus genes. (B) Top 07 significant Mvs.PT network (Cancer, Endocrine System Disorders, Developmental Disorder) contains 7 focus genes.

Ingenuity Analysis of upstream regulators

To further understand the context of activated networks, we used the upstream analysis tool of the Ingenuity software. Among the analysed activated upstream regulators for PTvs.N (z-score higher than 2) we found glycoprotein CD44, a stem cell marker for HNSCC29 and several growth factors like FGF2, TGFB1 and TGFB3 (Table 1A). Notably, there were only two significantly activated upstream regulators in Mvs.PT, namely the PI3K family which has been implicated with a more aggressive and invasive tumour type8,30 and the cytokine IFNγ which has previously been suggested as a salivary biomarker for early diagnosis of oral cancer31 (Table 2).

Table 1 Activated/Inhibited Upstream Regulators (Ingenuity) PTvs.N.
Table 2 Activated/Inhibited Upstream Regulators (Ingenuity) Mvs.PT.

Surprisingly, inhibited upstream regulators (z-score lower than -2) for PTvs.N included the oncogene MYC (Table 1). There were no significantly inhibited upstream regulators for Mvs.PT (Table 2). The Ingenuity Comparison Analysis of selected canonical pathways showed a high relevance for cancer pathways and tissue morphology in both primary tumours as well as metastases, whereas cell movement and inflammatory response were predicted more prominent just in metastases (Figs S8 and S9).

Further downstream analysis revealed several interesting networks, one of them being activated by Fibronectin 1 (FN1) in PTvs.N (Fig. S5). Since FN1 is involved in several cellular processes, such as remodelling of the extracellular matrix, it is of general note to investigate the expression and activation status of its downstream targets. This will allow the identification of potential activation processes and prediction of corresponding cellular processes. Downstream targets of FN1 include VEGFA, the Jnk Family, the Tgf beta family, HRAS and the ERK1/2 family, which collectively activate diverse cellular functions via the transcription factor ETS1 (expression of cytokine and chemokine genes), beta-Catenin (key downstream component of Wnt signalling), the transcription factor Jun-B (regulation of gene activity following growth factor response), the corresponding nuclear phosphoprotein c-Fos and Ap-1, as well as the transcription factor SP1. All of these genes were activated and/or overexpressed in our dataset PTvs.N (Fig. S5).

The PI3K family has been shown to be essentially important for metastatic transformation30 and is the most significant activated upstream activator in Mvs.PT (Table 2). The PI3K group activates NFkB via Akt and promotes the release of MMP9, which is the most important matrix metalloprotease in degrading the basement membrane, which is also predicted by Ingenuity (Fig. S6). The NFkB complex usually activates DNA damage response via RELA and RELB, blocking PPARG and the Jnk family but this could not be confirmed by our Ingenuity prediction for Mvs.PT (Fig. S7). The nuclear receptor PPAR gamma is activated, which should block pro-inflammatory NFkB-mediated responses32. In contrast, tumour-promoting cytokines, such as TNF-α and IL10 are upregulated (Fig. S7) underlining the complexity and crosslinking of malignant signalling pathways as previously described33.

Deregulated canonical pathways are essential for tumour maintenance and important factors of malignancy. For both primary tumours (Fig. S8) and metastases (Fig. S9), the pathways of Nicotine Degradation and Ethanol Degradation were activated (Figs S10 and S11). Since all patients were smokers and alcohol consumers (Table S1), those findings are in accordance to our expectations.

Comparison of amplification profiles of next generation genome sequencing data to our analysis

Since next generation sequencing (NGS) has had a major impact on cancer research, we compared recent NGS amplification34 and PARADIGM prediction data34 based on RNA sequencing data from the Broad Institute TCGA Genome database35 to our microarray analysis. Here, a log2FC higher than 0.263 was used as an overexpression tendency (corresponding to a 20% higher expression). For many genes, we found a concordant tendency of their regulation when we compared the genomic amplification and PARADIGM percentages (AMP) published by The Cancer Genome Atlas (TCGA) project36 and our expressional data (Table 3). Especially FGFR1 (13% in PTvs.N and 7% in Mvs.PT to 10% in AMP), FGFR3 (7% in PTvs.N and Mvs.PT to 2% in AMP), DDR2 in lymph node metastasis (13% in Mvs.PT to 3% in AMP) and MET (7% in PTvs.N and Mvs.PT to 2% AMP, FC 2.4; Fig. S28) seem to be similarly overexpressed. Remarkably, we did not find a significant overexpression of EGFR, neither in the primary tumour, nor in their corresponding lymph node metastasis. In contrast, RNA-Seq-based amplification data suggested an AMP of 15% for EGFR. The receptor Her-2 (ERBB2), which was amplified in 4% of specimens in TCGA cohort, was mostly downregulated in our expression data as well as the RNA-Seq data (FC 0.479; Fig. S28). MYC, which was shown to be inactivated in our Ingenuity upstream analysis, was overexpressed in 7% of patients, compared to 14% amplification in TCGA samples. E2F1 and NFE2L2 overexpression values were both higher than expected by TCGA data, and even more elevated in metastasis than in primary tumours. We also found a higher expression for EPHA2, DDR2 and PIK3R1 compared to AMP published by TCGA. No significant differences between the fold changes of the PTvs.N to the Mvs.N dataset could be detected for FGFR1, IGF1R, FGFR2, MET, HRAS, PTEN and NF1. ERBB2, FGFR3 and MYC overexpression percentages were significantly higher in primary tumours than in metastases.

Table 3 Comparison of Microarray Expression Data with RNA-Seq Data.

Comparison of individual Receptor Tyrosine Kinase (RTK) expression patterns

Due to their crucial role in HNSCC as upstream regulators of e.g. the phosphoinositol-3-kinase (PI3K) pathway mediating invasiveness and malignancy30, we concentrated on further exploring the expression of growth factor receptors (receptor tyrosine kinases, RTKs). Remarkably, in our study PI3K was one of only two significantly activated upstream regulators in Mvs.PT (Table 2). Based on the list of amplified genes from the HNSCC TCGA project, we created an overview of relative RTK expression for PTvs.N and Mvs.N (Fig. 5). With the exception of two patients, all patients showed overexpression tendency of at least one RTK indicated by an upregulation of at least 20%. Most patients showed an upregulation of random combinations of at least two RTKs. For example, patient 6 showed an upregulation combination of ERBB4, VEGFR2 and PDGFRB, and a slight upregulation of MET in PTvs.N. In patient 14, VEGFR2, PDGFRB and DDR2 were higher expressed in comparison to normal mucosa. An expanded overview of RTK and oncogene expression can be found in the Supplementary Material (Fig. S29).

Figure 5
figure 5

Comparison of Receptor Tyrosine Kinase (RTK) Expression of all patients. Differential expression values for selected genes were colour coded (bright red for an overexpression of ≥20%, bright green for a downregulation of ≥20%). In a ratio to their corresponding normal tissue mucosa, primary tumour values (PT) and metastasis values (M) are listed for each patient 1–15.

Discussion

Precision medicine requires individualized therapeutic approaches that fulfill the needs of each patient. Therefore, both prognostic as well as predictive markers are a crucial prerequisite. The thorough understanding of underlying mechanisms of tumorigenicity is not only the basis for the development of new drugs, but also for biomarker development.

In this study we analysed and compared the expression profiles of 15 primary tumours to their corresponding normal tissue mucosa and paired lymph node metastasis. In contrast, previous microarray-based studies of HNSCC have focused only on primary tumours versus normal mucosa or primary tumour patterns of expression16,17,18,19,20,21,37. Since the number of patients is relatively low, p-values have to be interpreted with caution38. We are aware of the limitations of this study and recommend replicate tests for their verification.

We have identified and verified several genes and pathways that are significantly overexpressed in HNSCC tumours with lymph node metastases. Based on this study, we here provide a model how genes involved in tumorigenesis and metastatic progression are differentially expressed in HNSCC, involving changes in the extracellular matrix, hypoxia and EMT (overview in Fig. 6). Our finding of highly upregulated matrix metalloproteases, integrins, collagens, fibronectin, laminin and other ECM proteins supports previous studies showing that degradation of the basement membrane and remodelling of the extracellular matrix are essential during the invasion process of a tumour39,40,41.

Figure 6
figure 6

Potential prognostic biomarkers for HNSCC linked to hypoxia, EMT and ECM remodelling are directly connected. Hypoxic environments induce genes promoting an aggressive, invasive phenotype, mediated by epithelial to mesenchymal transition (EMT) and remodelling of the extracellular matrix (ECM). Matrix metalloproteinases (MMPs) and other proteases degrade the basement membrane and new collagen fibres are aligned by modifying enzymes to facilitate local invasion and recruitment of cancer associated fibroblasts.

Molecules Interacting with CasL-2 Proteins (MICALs) are monooxygenases that cause actin to depolymerize and are implicated in the regulation of cytoskeletal dynamics, intracellular trafficking, the docking and fusion of exocytotic vesicles and also have anti-apoptotic properties42. MICAL2, which was significantly upregulated in our PTvs.N dataset, has been identified as one of the most significantly upregulated genes in smoking-induced lung cancer43 and has been associated with prostate cancer progression44 as well as EMT, cancer growth and invasion45, but has not been implicated with head and neck cancer, yet.

The destruction and remodelling of the extracellular matrix not only drives the mechanical and proteolytic invasion of tumour cells, but also favours metastatic cell reprogramming and behaviour46,47,48. Specialized Cancer Associated Fibroblasts (CAFs) can degrade various ECM proteins like collagens, basement membrane laminin and fibronectin46, leading to a more aggressive behaviour of the cancer, lymph node metastasis and poor prognosis, as shown in oral cancer49,50. When the Chemokine C-X-C Motif Receptor Type 4 (CXCR4), whose expression can be enhanced by hypoxia, binds Stromal Cell-derived Factor 1 (CXCL12), secreted by CAFs, release of MMP-9 to the tumour microenvironment is stimulated51,52. In our data, we identified a switch from MMP1, 3 and 13 in primary tumours to MMP1, 9 and 12 in metastases, as well as a significant upregulation of CXCL12/CXCR4. Resulting MMP levels and the ratio of activated MMPs to total MMP concentration compared to adjacent normal tissue is positively correlated with lymph node involvement51,53. MMP9 is the most versatile of the MMPs and can degrade various elements of the tumour microenvironment, like elastin, fibrillin, laminin, gelatin and collagens of type IV, V, XI and XVI52,54.

Hypoxia is very common in solid tumours, especially in HNSCC55, which has repeatedly been associated with poor prognosis56. Additionally, hypoxic environments induce genes supporting stem cell maintenance, angiogenesis and epithelial to mesenchymal transition (EMT), leading to an aggressive, invasive phenotype57,58,59. CXCR4 inhibitors have most recently been suggested as treatment for advanced and metastasizing HNSCC60, and have previously been reported to have synergistic effects with anti-PDL1 therapies61.

Homeobox B7 (HOXB7) is a developmental gene we found highly upregulated in our PTvs.N dataset. HOXB7 confers epithelial-mesenchymal transition in breast cancer62, promotes malignant progression by activation of TGF signalling63 and lung metastasis64 and renders breast cancer cells resistant to tamoxifen through activation of the EGFR pathway65. HOXB7 overexpression is associated with poor prognosis66 and has been suggested as a prognostic factor for oral squamous cell carcinoma67. Migration of endothelial progenitor cells is mediated by CXCL12/CXCR4 via PI3K/Akt/eNOS signalling68. The CXCL12/CXCR4 system also facilitates lymph node metastatic potential in oral squamous cell carcinoma by enabling EMT69.

The phosphoinositol-3-kinase (PI3K) pathway is the most frequently altered oncogenic driver in head and neck cancer and its upstream regulators and downstream targets might be the most interesting targets for predictive biomarkers30,70. While we could not show a single significantly upregulated growth factor receptor for all patients, we could confirm the global upregulation of the pathway in our Ingenuity Analysis. Interestingly, most patients showed a tendency for upregulation of at least two RTKs in varying combinations (Fig. 5). Several correlations involving sets of RTKs have previously been shown in different tumour entities. A meta-analysis for breast cancer has indicated the association of an overexpression of various RTKs with poor outcomes and suggested a clinical evaluation of combination agents against RTKs or relevant oncogenic nodes71. In breast cancer, MEK inhibition resulted in acute ERK activity loss and subsequent c-MYC degradation that induced expression and activation of PDGFRB, DDR2 and VEGFR272, an RTK combination found in patients 6 (PT) and 10 (M) of our dataset (Fig. 5). It has been proposed that upregulation of FGFRs (as seen in patients 4, 9 and 10, as well as to a minor degree in patients 7, 12, 14 and 15, Fig. 5) could be involved in the development of resistance to anti-VEGFR therapies, since both pathways are regulating tumour neoangiogenesis (as reviewed in73). In breast cancer patients with FGFR1 or FGF3 amplification, multikinase FGFR/VEGFR inhibitors have shown promising activity74. It has been suggested to treat RTK-driven tumours by hitting signalling nodes interconnecting core pathways, targeting c-Abl as a signalling node connecting MET and TP5375,76 (cp. patients 2, 5, 6 and 9, Figs 5 and S29).

In comparison with recent high throughput sequencing data, we showed a comprehensive list of upregulated genes for each patient. Regarding numerous targeted therapies for HNSCC currently in clinical studies26, it will be essential to develop predictive biomarkers allowing the selection of an optimal therapy for each patient. Currently, molecular companion diagnostics are still lacking for all therapeutic approaches in HNSCC. For Cetuximab, the only targeted therapy currently in clinical use for HNSCC, the development of acneiform rash is still the best indicator for therapy response77. EGFR-expression status, determined by immunohistochemistry, could not be verified as a predictive marker for EGFR-targeted therapy response78,79, even though mechanisms of action and resistance mechanisms have been thoroughly investigated (as reviewed in80,81). EGFR is amplified in normal mucosa of tobacco using patients, and might thereby be not the oncogenic driver, but rather a response to carcinogenic agents81. This might explain our results for the comparison of EGFR expression in primary tumour vs. normal mucosa (Table 3 and Fig. 5).

The combination of NGS with other approaches is most promising for the development of clinically relevant, molecular subgroups that will guide treatment (as reviewed in82). Whole genome expression profiling, including mutational analysis by RNA-Seq might surpass the limitations of microarray technology, but currently is still too expensive to be included in large multi-centered clinical studies. This experimental approach might also lack reproducibility due to different platforms and handling. Additionally, the large amount of data produced by high throughput sequencing techniques has to be processed and interpreted carefully. Microarrays may no longer be the standard for basic scientific research, but are currently miniaturized and automated as a cost-effective on-chip nanotechnology for clinical diagnostic approaches83,84. We suggest analysing a selection of biomarker candidates by customized automated microarrays, which should be narrowed down by previous next generation sequencing approaches. These analyses should be performed parallel to clinical trials for HNSCC and other types of cancer. Correlating a small list of chosen potential biomarkers to clinical data of outcomes and treatment efficiency might facilitate following health technology assessment for drug approval as well as biomarker development. Such approaches should also be employed for investigating the pathobiological relevance of so called ‘e-liquids’, currently widely considered as a more ‘healthy’ way of smoking by the population.

In conclusion, we here shed new light on the expression differences in lymph node metastasis compared to primary HNSCC tumours, and provide the data for further analyses.

Material and Methods

Reagents

All reagents were purchased from Sigma unless stated otherwise.

Patient characteristics and material

Tissue samples were obtained from 15 patients undergoing surgical resection at the departments of otolaryngology of the Universities of Frankfurt and Mainz. Investigation has been conducted in accordance with the ethical standards according to the Declaration of Helsinki and according to national and international guidelines. The study protocol has been approved by the ethics committees “Ethik-Kommission des Fachbereichs Medizin, Universitätsklinikum der Goethe-Universität” (#83756604) and “Ethik-Kommission der Landesärztekammer Rheinland-Pfalz” (#83748515) after obtaining the patients’ informed consent to participate in the study and was processed anonymously. All cases were diagnosed histopathologically as HNSCC and staged according to the TNM classification of malignant tumours recommended by the ‘Union International contre le Cancer’ UICC. In this study, tumour specimens, corresponding non-malignant tissue from the oral cavity (buccal mucosa), and lymph node metastasis from each patient were analysed. The median age of patients was 56 years and 86% were males. Six of the resected tumours were classified as grade 2 (40%), nine as grade 3 (60%). Twelve patients had a lymph node status of N1/2 (80%) and three of N3 (20%). The majority of cancers were located in the hypopharynx (60%), 13.3% in the larynx and 26.6% in the oropharynx (Table S1). Specimens included oropharyngeal and laryngopharyngeal carcinoma of different tumour size (T1–T4) and lymph node status (N1-3), without distant metastasis (M0). Additional inclusion criteria were positive alcohol and tobacco use, HPV-negative status of the primary tumour, a histopathological grading of 2–3 and the patients did not have previous cancer treatment before surgery. Upon resection, samples were immediately placed on ice and snap-frozen in liquid nitrogen within 30 min. Histological analyses were performed to ensure that each specimen of primary tumours and metastasis contained >70% tumour cells and <10% necrotic debris. Normal mucosa was histopathologically checked for non-malignancy. All criteria were cross-checked and approved by the independent analyses of at least two qualified head and neck pathologists. Samples not meeting these criteria were rejected.

RNA extraction

Frozen tissue samples (30–50 mg) were collected in 1 ml Trizol (Invitrogen, Karlsruhe, Germany) and dispersed using an Ultra-Turrax T25 tissue homogenizer (IKA Werke, Staufen, Germany). Total RNA was extracted according to the recommendations given by the manufacturer’s Trizol protocol and further purified on RNeasy Mini spin columns (Qiagen, Hilden, Germany). Integrity and purity of total RNA were assessed on a Bioanalyzer 2100 (Agilent Technologies, Boeblingen, Germany) using RNA 6000 Nano LabChip Kit (Agilent) according to the manufacturer’s instructions as previously described85, accepting absorbance ratios of 1.8–2.2 for A260/A280 and a A260/A230 ratio of lower than 1.7. Samples not meeting these criteria were rejected.

Reverse transcription (RT)-PCR

RT-PCR was performed using specific primer sets for OSF2, FN1, KRT24, CLCA4, PRR4, SCCA2b, TP73L, LYPD3, FCMR, ARHGAP25 and RASGRP2 genes (Table S9). GAPDH was used to control the amounts of cDNA generated from each sample as described85,86,87. First-strand cDNA synthesis was carried out using a cDNA synthesis kit for RT-PCR (Superscript II, Invitrogen life technologies). 1 µg of total RNA was converted to cDNA and 1 µl of the 20 µl RT product was amplified for 30 cycles (initial denaturation at 95 °C for 3 min, 30 sec at 95 °C, 30 sec at a variable temperature for annealing and 1 min at 72 °C) followed by an extension of 5 min at 72 °C. RT-PCR amplification products were analysed on 2% agarose gel stained with ethidium bromide.

Quantitative Real-Time PCR analysis

For ARHGAP25, SERPINB4 and PRR4, changes in mRNA levels detected in microarray experiments were evaluated by quantitative real-time PCR analysis, using the iCycler (BioRad, Munich, Germany). One µg of total RNA was converted to cDNA using Superscript II (Invitrogen Corporation) and oligo(dT) primer according to manufacturer’s specifications. PCR reaction mixtures consisted of 12.5 µl of 2× iQ SYBER Green Supermix (Abgene, UK), 0.5 µl of each 10 µM target primer, 1 µl (1:10) cDNA template in a final scaled down reaction volume of 25 µl. Thermal cycling conditions comprised an initial denaturation step at 95 °C for 15 min, 40 cycles at 95 °C for 30 sec and variable annealing temperatures for 30 sec depending on the respective set of target primers. Cumulative fluorescence was measured at the end of the extension phase of each cycle. Specific amplicon formation with each primer pair was confirmed by dissociation curve analysis and by visualization of a single band on a 2% agarose gel. To define the relative expression of the genes, the PCR product results from each tumour sample were compared with the results from normal tissue or lymph node metastasis from the same patient. The relative expression ratio (R) of target gene is calculated using the equation: Ratio = (Etarget)ΔCp target(control-sample)/(Eref)ΔCpref(control-sample) based on its real-time PCR efficiencies (E) and the crossing point (CP) difference of an unknown sample versus a control and expressed relative to the non-regulated housekeeping gene glyceraldehyde-3-phosphate dehydrogenase gene (GAPDH), as described86,88.

Target preparation and hybridization for Affymetrix GeneChip Arrays

Total RNA (5 µg) was used to prepare biotinylated cRNAs for hybridization, following the guidelines given in the Affymetrix GeneChip Expression Analysis Technical Manual. cRNA clean-up was performed on RNeasy Mini filters (Qiagen). In all, 10 µg of fragmented, labelled cRNA were hybridized to Affymetrix HG-U133A arrays (Affymetrix, Santa Clara, CA, USA) using standard conditions (16 h, 45 °C). Arrays were washed and stained in a Fluidics Station 400 (Affymetrix) and scanned on a Gene Array Scanner 2500 (Agilent), as recommended by Affymetrix.

Statistical Analysis

Analysis of the data was performed using R 2.15.2 with the limma package 3.14.4. Raw fluorescence intensities from all hybridizations were normalized, applying robust multichip average (RMA) normalization for the. CEL data, followed by a quantile normalization to compare expression results across specimens89,90. For the comparison of primary tumours to normal mucosa (PTvs.N), metastasis to normal mucosa (Mvs.N) and metastasis to primary tumours (Mvs.PT) of differentially expressed genes, we also utilized the limma package and adjusted our model for each tissue type (supervised clustering). For multiple testing a Bonferroni correction was performed. Due to the large number of tests applied in this study, p-values have to be interpreted with caution.

For comparative analysis of our microarray analysis to NGS amplification data34 and PARADIGM prediction data (including RNA-Seq)34, we used genomic amplification percentages (AMP). AMP specifies the percentage of samples for which amplification of a respective gene was published by The Cancer Genome Atlas (TCGA) project36. It should be noted that in our study we just compared data of HPV-negative tumours.

Ingenuity analysis

The Ingenuity Pathway Analysis (IPA) tool (Ingenuity Systems) was used to identify pathways related to enriched genes in primary tumours (PTvs.N) or metastasis (Mvs.N), as well as primary tumours compared to metastasis (Mvs.PT). Data was filtered to meet the criteria p < 0,0001 and a log2FC > |1,5| threshold in our comparison analysis. For each probeset, its Affymetrix probeset ID and its p-value from the comparison analysis were loaded into the analysis tool and mapped to its corresponding gene object in the Ingenuity Pathways Knowledge Base.

Data availability statement

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.