Introduction

The worldwide annual incidence of head and neck cancers is almost 700,000, and 380,000 patients succumb to their disease annually.1 Oropharyngeal squamous cell carcinoma (OPSCC) accounts for ~20% of all new head and neck cancers, and the incidence is expected to rise over the following decades.1,2,3 This increase is mainly due to the cancers related to the human papillomavirus (HPV), and particularly due to its high-risk genotype HPV-16.2, 4

Traditionally, the main risk factors for OPSCC have been smoking and heavy alcohol consumption.5 Patients diagnosed with HPV-related OPSCC tend to be younger, and the consumption of alcohol and tobacco is often lower or even absent.6 HPV-related tumours have a better prognosis, a lower risk of secondary malignancies and the disease responds better to (chemo)radiotherapy.5, 7 It is also of note that HPV-associated OPSCCs in tobacco users behave like classical tobacco-associated OPSCCs.8 While the de-escalation of HPV-positive OPSCC patients’ treatment is under investigation,7 patients with HPV-negative OPSCC still require heavy treatment and the prognosis remains poor.6 At the moment, the only way to improve the prognosis of patients with HPV-negative tumours would be to diagnose them earlier.

Currently, there are no diagnostic biomarkers for OPSCC to enhance its detection at an earlier stage. Brush samples, used successfully for cervical cancer screening, have been shown to be ineffective in screening HPV-positive OPSCCs, and no diagnostic biomarkers from standard bio-fluids exist.9 HPV vaccinations could eventually decrease the epidemic of HPV-related OPSCC; however, even if effective vaccination programmes were launched, the decrease in incidence would only be seen after a couple of decades.10

Protein expression levels in both tumour tissue and serum samples of patients with OPSCC have been studied, showing some alterations, compared with those of healthy controls.11,12,13,14,15,16 However, these studies have often been targeted to recognised proteins, based on earlier studies on other cancers. Discovery-driven mass spectrometry proteomics offers the possibility to discover novel biomarkers and pathways, as well as to associate the findings with clinical aspects.

Our objective was to compare the serum protein profiles of patients with early-stage OPSCC and of healthy controls, to promote early cancer diagnostics. For early-stage tumours, we chose stage I and stage lI tumours (eighth edition of TNM classification of malignant tumours, 2016). Protein p16, i.e. cyclin-dependent kinase inhibitor 2A, is used as a surrogate marker for HPV status at our department and also in this study. The protein was first presented for OPSCC by Klussmann et al. and is now an established immunohistological marker, widely used instead of the arduous and expensive HPV detection and typing.17 We analysed the serum samples in ultra-definition MSE (UDMSE) mode. Of three data-independent data-acquisition methods available in the Synapt G2-S (MSE, high-definition MSE (HDMSE) and UDMSE), the last one was chosen as it gives the best protein coverage on the sample.18 Based on the proteomic changes revealed, we aimed to find a set of proteins that are possibly usable as a biomarker panel for early-stage OPSCC.

Materials and methods

Patients and serum samples

Serum samples from 25 patients diagnosed with stage I–II OPSCC were collected prior to treatment between the years 2012 and 2015 at the Department of Otorhinolaryngology—Head and Neck Surgery, Helsinki University Hospital, Helsinki, Finland. After collection, the samples were allowed to clot at room temperature (RT) before they were centrifuged at 4 °C (1000 × g) to separate serum. Sera were stored at –70 °C until all were assayed at the same time. The inclusion strategy by the TNM status was based on the eighth edition of TNM classification of malignant tumours, dividing HPV-positive and HPV-negative OPSCCs as separate entities,8 and protein p16 status was used as a surrogate marker for HPV. Twelve serum samples from age-matched and gender-matched control patients were received from the Finnish Red Cross Blood Service.

Written informed consent was obtained from all patients. The study plan was approved by the institutional Research Ethics Board at the Helsinki University Hospital (DNr. 51/13/03/02/2013).

Reagents

Reagents for serum pre-processing, Pierce Swell Gel Blue Albumin Removal Discs, Pierce Centrifuge columns and Pierce C18 Spin Columns, were acquired from Thermo Scientific (Rockform, IL, USA), solvents and high-purity HPLC reagents from Waters (Milford, MA, USA) and other reagents from Sigma-Aldrich (St Louis, MO, USA).

Serum treatment and protein digestion

The workflow has been described previously in detail.19 In brief, the samples were thawed, and after the depletion of the top 12 proteins with Pierce Top 12 protein depletion columns, the protein concentration was measured by a bicinchonic acid assay kit (Pierce, Thermo Scientific, Rockform, IL, USA) for the total protein concentration. Top 12 protein-depleted serum samples corresponding to 350 µg of total protein were dried in a speed vacuum (Savant, Thermofisher), and then dissolved in 6 M urea and 100 mM Tris-HCl (pH 7.4). Reduction of disulphide bonds was performed with 10 mM of dithiothreitol (DTT) for 60 min at RT, and thereafter 30 mM iodoacetamide was used for alkylating the proteins for 60 min in the dark at RT. Protein digestion was performed with trypsin (Promega, Madison, WI) for 18 h at +37 °C after the consumption of excess iodoacetamide by adding DTT again (30 mM DTT, 60 min at RT). Samples were diluted 1:10 with high-purity Milli-Q water (Millipore, Billerica, MA, USA) before addition of trypsin. Finally, the samples were purified in C18 spin columns, and dried in a speed vacuum and dissolved in 0.1% formic acid containing 12.5 fmol Hi3 peptide mixture (Waters) per µl. All of the procedures described were performed according to the manufacturer’s instructions, wherever applicable.

Liquid chromatography—ultra-definition MSE

Four-microlitre samples corresponding to 1.4 µg of total protein were injected to the ultra-performance liquid chromatography (UPLC) system (Waters Corporation, Billerica, MA, USA).18 TRIZAIC nanoTile 88-µm × 100-mm HSS-T3u wTRAP was applied as a separating device before mass spectrometry (MS). After loading and trapping, the samples were washed for 2 min at 8.0 µl/min with 1% buffer B. The analytical gradient was used as follows: 0–1 min 1% B; at 2 min 5% B; at 65 min 30% B; at 78 min 50% B; at 80 min 85% B; at 83 min 85% B; at 84 min 1% B and at 90 min 1% B with 450 nl/min. Buffer A consists of 0.1% formic acid in water and buffer B consists of 0.1% formic acid in acetonitrile (Sigma-Aldrich).

The data were acquired with UDMSE with Synapt G2-S UDMS (Waters Corporation) including ion mobility spectroscopy (IMS). The data range was 100–2000m/z, scan time 1 s, IMS wave velocity 650 ms−1 and collision energy ramped in trap between 20 and 60 V. Calibration was performed by Glu1-fibrinopeptide B MS2 fragments and Glu1-fibrinopeptide B precursor ion, used during the acquisitions as a lock mass. In total, 10% of the samples were acquired as triplicates to validate the results, and further analysis was conducted with Progenesis QI for Proteomics software (Nonlinear Dynamics, Newcastle, UK) (Supplement S2—triplets).

The mass spectrometry proteomics data have been deposited into the ProteomeXchange Consortium via the PRIDE partner repository with the data set identifier PXD008445.20

Data analysis

The data analysis was described previously in detail.21 Briefly, Progenesis QI for proteomics software (Version 3, Nonlinear Dynamics) was used for processing raw files. Peptide identification was run with Uniprot human FASTA sequences (UniprotKB Release 2015_09, 20205 sequence entries), and label-free protein quantification was performed with the Hi-N method (Protein Lynx Global Server).22 The samples were spiked with 12.5 fmol/µl of CLPB_ECOLI (P63285, ClpB protein) peptides (Hi3 Escherichia Coli Standard, Waters).

The peptide identification parameters were fixed modification of cysteine (carbamidomethyl) and variable modification of methionine (oxidation). The peptide error tolerance was set to a maximum of 10 ppm, the false-discovery rate was limited to less than 2% and default values (in Progenesis QI for Proteomics) were used for the rest of the parameters.

The quantified proteins in all comparisons were compared by ANOVA on a protein-by-protein basis and their expression levels were considered significantly different if the ANOVA p value was <0.05. Principal component analysis (PCA), offering the visualisation of the main axes of variation in the data groups, was performed by Progenesis QI for proteomics. Processing the Progenesis QI data with EZinfo 3.0 software (a statistical tool released in December 2014, Umetrics, Sweden), supervised OPLS–DA modelling was performed. With a p(corr) cut-off of ± 0.80, a variance versus correlation plot (S-plot) and a list of S-plot proteins was generated from OPLS–DA data.

Protein–protein interactions, pathways and networks

STRING 10.5 database illustrates known and predicted protein–protein interactions (PPI),23 and was used for PPI analyses, giving a sophisticated view of possible and known interactions between proteins. PPI analyses were conducted to filter the S-plot proteins and project them to connected pathways and/or co-expression. Medium stringency was used for inferring the networks from protein lists on the STRING DB and textmining was excluded as a setting.

The network and canonical pathway overrepresentation analyses were conducted through the use of Ingenuity pathway analysis (IPA; QIAGEN Inc., https://www.qiagenbioinformatics.com/products/ingenuity-pathway-analysis) with default parameters to identify which networks and pathways were most enriched in our protein list.24 IPA networks differ from PPIs in their way of connecting proteins. In addition to the proteins actually present, they combine the information about possible connector proteins (not present in the user-supplied list). This allows another way of finding the networks the proteins are enriched into. IPA analyses were conducted on the proteins with the ANOVA p value < 0.05 and S-plot proteins were then separately matched to the proteins in enriched networks.

Results

Metadata and workflow

Twenty-five serum samples from patients with stage I and stage II OPSCC, together with 12 samples from healthy controls were studied. Of the 25 patients with stage I–II tumours, 12 had p16-positive and 13 had p16-negative tumours.

The tumour localisation was tonsil in 15 (60%) of the 25 patients, base of the tongue in 8 (32%), the soft palate in 8 (32%) and posterior wall of the oropharynx in 1 (4%). Sixty percent of the patients were male and 40% were female. The age of the patients varied from 36 to 78 years with the median age being 60.85 (average 60.92). More detailed clinical parameters are provided in Supplementary Table 1. The data analysis workflow is presented in Fig. 1.

Fig. 1
figure 1

Data analysis workflow. Protein quantification data were from ultra-definition MSE, and proteins with two or more unique peptides were approved for identification. ANOVA cut-off of 0.05 was used. PCA: principal component analysis is used to visualise the variation between groups. OPLS–DA: latent structures discriminant analysis brings data for the S-plot for an efficient comparison of protein expression profiles. PPI: protein–protein interaction network gives the known and predicted functional and physical associations between single proteins in the S-plot. IPA: Ingenuity pathway analysis is an analysis tool revealing pathways and potential networks associated with the given data

All early-stage OPSCCs versus controls

Protein identification and PCA

With the criterion of two or more unique peptides per protein identification, 176 proteins were quantified from all cases and controls were analysed. The identified proteins were compared by ANOVA on a protein-to-protein basis. With the ANOVA cut-off of 0.05, 152 proteins with two or more unique peptides were quantified (Supplementary Table 2). Based on serum protein expression levels of patients with early-stage OPSCC and healthy controls, the two groups were found to be separated in PCA (Fig. 2).

Fig. 2
figure 2

Principal component analysis using serum protein expression data of early-stage OPSCC versus controls (two or more unique peptides, ANOVA p value < 0.05). Early-stage tumour samples are marked with red and controls are marked with blue

OPLS–DA

As another group classification method, OPLS–DA modelling was performed, and an S-plot was generated, presenting 96 proteins that most reliably distinguished patients from controls (Fig. 3). These proteins passed the p(corr) cut-off of ± 0.80 and were thus considered significantly different (Table 1). Of the 96 proteins, 13 were expressed in higher levels in early-stage OPSCCs when compared to controls, and the remaining 83 proteins had lower levels in cases compared with controls.

Fig. 3
figure 3

S-plot obtained from OPLS–DA regression analysis of the serum protein expressions in early-stage OPSCCs versus controls (p(corr) ± 0.80). The proteins were downregulated in tumour patients’ serum at the upper-right corner and upregulated on the lower left

Table 1 S-plot proteins obtained from OPLS–DA regression analysis (p(corr) ± 0.80), of all stage I–II OPSCCs versus controls

Protein–protein interactions

To further study our set of S-plot proteins and to try to identify the most relevant proteins, protein–protein interaction (PPI) webs were created using the STRING 10.5 database. Proteins with the most interactions, with connections to other proteins ranging from 9 to 16, were prothrombin (F2), plasminogen (PLG), alpha-2-antiplasmin (SERPINF2), histidine-rich glycoprotein (HRG), beta-2-glycoprotein 1 (APOH), carboxypeptidase B2 (CPB2), inter-alpha-trypsin inhibitor heavy chain H4 (ITIH4) and complement C2, C5, C4-A and C4-B (C2, C5, C4A and C4B).

According to the UNIPROT database,25 these proteins seemed to be associated with complement activation (early and late), extracellular matrix remodelling and lipid metabolism, for example. PPIs of the S-plot proteins are shown in Fig. 4.

Fig. 4
figure 4

PPI network of S-plot proteins (p(corr) ± 0.80) manifesting in the stage I–II OPSCC. The five serum proteins discussed in the article as possible biomarkers for early-stage OPSCC and suggested for further screening are circled: CFHR2 and MYL12A, upregulated in the tumour patients’ serum are circled with red, and the downregulated C9, FCN3 and C4BPA are circled with green

Pathways and networks

The top six IPA networks where the identified proteins were most enriched were 1. developmental disorder, hereditary disorder and immunological disease; 2. lipid metabolism, molecular transport and small-molecule biochemistry; 3. humoral immune response, inflammatory response, haematological system development and function; 4. cardiovascular disease, organismal injury and abnormalities and tissue morphology; 5. hereditary disorder, ophthalmic disease, organismal injury and abnormalities and 6. cell morphology, cellular development, cellular assembly and organisation. The score of the top six IPA networks ranged from 21 to 45. There were 13–23 proteins with the ANOVA p value < 0.05 participating in each of the networks and the total amount of focus molecules was 108. Of these, 46 were S-plot proteins (p(corr) ± 0.80). The network linked with lipid metabolism, containing 14 S-plot proteins, is illustrated in Fig. 5. The other five networks are presented in Supplementary Table 3. Altogether, among the S-plot proteins present in the top six IPA networks, four were upregulated in cases versus controls: complement factor H-related protein 2 (CFHR2), GREB1-like protein (GREB1L), myosin regulatory light chain 12A (MYL12A) and myotonin-protein kinase (DMPK). CFHR2 and MYL12A were also found to be binding in the PPI clusters. The remaining 42 S-plot proteins presented in the top three IPA networks were downregulated in cases versus controls, and the majority of these were also present in the PPI clusters.

Fig. 5
figure 5

IPA network 2. Lipid metabolism, molecular transport and small-molecule biochemistry with a score of 45. See text for details

In the canonical pathway analyses conducted with IPA, acute phase response signalling, LXR/RXR activation, FXR/RXR activation and the complement system were among the highest enriched pathways. The top canonical pathways are shown in Supplementary Figure 1.

Comparison between P16-negative and P16-positive tumours

Protein identification and PCA

In a comparison between p16-negative early-stage tumours and controls, 148 proteins were found with different expression levels in the serum samples. In the case of p16-positive early-stage tumours, the number was 152. When comparing the p16-negative and p16-positive groups with each other, 24 proteins were differently expressed. The protein identification tables are presented in Supplementary Table 2 and PCAs are presented in Supplementary Figures 46.

OPLS–DA

In the comparison between patients with early-stage p16-negative OPSCC and healthy controls, 103 proteins were presented in the S-plot (p(corr) ± 0.80), and 104 proteins were presented in the comparison of p16-positive tumours versus controls, respectively (Supplementary Figures 7 and 8). Of these, 96 were common between the two groups, shown in a Venn diagram (Supplementary Figure 9). It is of note, though, that the fold changes of the S-plot proteins were not identical in the two groups. The lists of S-plot proteins in the two comparisons are presented in Supplementary Table 4.

Protein–protein interactions

PPI networks of p16-positive and p16-negative groups were also studied separately, and the results showed great consistency with those from all early-stage OPSCC samples combined. Minor differences between p16-positive and p16-negative groups were detected, for example, there were slight differences in the protein interactions in the coagulation pathway. The PPI networks are shown in Supplementary Figures 10 and 11.

Pathways and networks

In the network analysis conducted with IPA, most networks were represented in both p16-positive and p16-negative patients’ data. Some differences were found, for example, haematological disease, haematological system development and function and organismal functions were a network solely enriched in the p16-positive group. The results are shown in Supplementary Table 3. In canonical pathway analyses, the top enriched pathways were also almost identical between p16-positive and p16-negative groups with some differences in their order (Supplementary Figures 2 and 3).

Discussion

The mortality of OPSCC ranges from 19 to 86%, the main predictive markers being tumour stage and HPV status.8, 26 At the moment, most tumours are diagnosed at an advanced stage, and thus the best way to improve their prognosis would be to diagnose them at an earlier stage.27 Currently, there are no known biomarkers to detect OPSCCs before clinical signs exist. When diagnosed, tumours are either visible or cause clinical symptoms, for example, dysphagia, pain etc.27 Discovering serum proteins that distinguish patients with early-stage cancer from healthy controls would be of great value from a diagnostic point of view. In order to identify possible proteins to be used as such biomarkers, we analysed serum samples of 25 patients diagnosed with stage I–II OPSCC and 12 healthy controls. Altogether, 176 serum proteins were reliably quantified, and the expression profiles of OPSCC patients differed clearly from those of healthy controls.

The discovery-driven nature of mass spectrometry-based analysis offers a unique chance to discover proteins and pathways that have not previously been studied in OPSCC. In previous serological studies, an association between serum antibodies towards HPV-16 early (E) antigens and HPV-positive OPSCC has been described, and these E antibodies have been studied as potential diagnostic biomarkers for HPV-related OPSCC. Recently, seropositivity for E6 antibodies was described as a highly sensitive (96%) and specific (98%) marker for HPV-positive OPSCC.12 However, in this study lacking a control group, the majority of patients had advanced stage tumours: there were 134 patients with stage lV tumours and 80 patients with stage I–III tumours. In addition, another study presenting an algorithm incorporating information about multiple E antibodies with a high sensitivity (83%) and specificity (99%) has been conducted for the detection of HPV-related OPSCC.11 In that study, age-matched and sex-matched healthy individuals served as healthy controls. However, analyses were made on patients with tumours of all stages and only a few represented early-stage tumours. Thus, the clinical use of the E antigens still remains an open question, as there is no information about their usability in early diagnostics, for example. In addition to E antigens, serum levels of matrix metalloproteinases (MMP1, 2 and 9) have been studied in oropharyngeal, laryngeal and hypopharyngeal carcinoma by Kalfert et al. and found not to serve as suitable prognostic tumour markers in these cancers.14 MMP1 expression was described as being significantly influenced by smoking and p16 expression. There was no control group in the study. Also, serum levels of IL-10, TNF-α, TGF-β, VEGF, Cyfra21-1, SCCAg, ferritin, CEA, CA19-9 and AFP have been studied in oral and oropharyngeal carcinoma patients.15, 16 In summary, until now, serological studies have not elucidated any biomarkers that will allow detection of oropharyngeal tumours at an early stage.

The OPLS–DA modelling generates a list of the most significant proteins in terms of group separation (S-plot proteins). This level of discrimination is difficult to obtain using other statistical methods. Statistically significant differences in expression of serum proteins between patients with early-stage OPSCC, when compared with controls, included 13 upregulated and 83 downregulated proteins. Of these, IPA networks and PPI analyses revealed interesting clusters of these proteins acting together. In the PPI network of the S-plot proteins of early-stage OPSCCs versus controls, examples of the pathways and biological processes visualised were complement activation (early and late), extracellular matrix remodelling, angiogenesis and possible tumour growth. Among the proteins with most interactions were complement C5, C4-A and C4-B (C5, C4A and C4B), prothrombin (F2), plasminogen (PLG), carboxypeptidase B2 (CPB2), alpha-2-antiplasmin (SERPINF2), histidine-rich glycoprotein (HRG) and insulin-like growth factor-associated proteins (IGFBP3, IGFALS). The complement cascade is one of the most studied biological processes in cancers.28 Dysregulated complement activation in the tumour microenvironment has been recently linked with increased inflammation and thus suppression of antitumour immune responses, leading to tumour cell proliferation, migration and invasive potential.29 The decrease of the plasmic complement C4-A has previously been described by Koifman et al. and Ornellas et al. in HPV-positive squamous cell carcinoma of the penis.30, 31 Also, genetic deficiency of the complement isoforms C4A or C4B may predict improved survival of metastatic renal cell carcinoma.32 In our study, serum levels of complement C4-A were lower in comparison with controls.

A common approach to biomarker signature discovery for any given group of patient samples is to perform a classification analysis such as the one we have done (OPLS–DA). However, by using different approaches to discovery, different molecules that differentiate the disease can be found. Moreover, the biological interpretation is often difficult due to the complicated nature of how gene/protein signatures are found, including the lack of causal relationships between protein expression and disease. The two aforementioned shortcomings are currently preventing biomarkers from becoming standard clinical tools. To circumvent these problems, network-based approaches have been proposed to be integrated with feature-selection algorithms.33 These network-based approaches include protein–protein interactions, canonical pathways and Gene Ontology annotations, which can help interpret the feature selection for various purposes including biomarker discovery.34 However, different approaches to these network-based methods lead to slightly different results, such as those employed by STRING DB or IPA.35 The methodology in the present work was chosen according to what has been suggested by deep analysis of common network-building software modules, i.e. that at least two different methods should be used for the purpose of network inference.35 To be able to filter our protein set, and to further identify a potential panel of proteins to serve as a diagnostic panel, IPA network analysis was conducted. There were six networks considered significant, having a score of 21 or more and at least 13 focus proteins.36 The first and third of the top six IPA networks with the best scores and focus molecules were developmental disorder, hereditary disorder and immunological disease and humoral immune response, inflammatory response, haematological system development and function. These networks were associated with complement activation, thus being consistent with the data received from PPI analyses. Proteins found in the second network, lipid metabolism, molecular transport and small-molecule biochemistry, were associated with lipoprotein metabolism and lipid digestion, mobilisation and transport. Most solid tumours tend to get hypoxic and are thus acidic.37 This causes tumour cells to increase their uptake of apolipoproteins, handle fatty acids more rapidly and enhance their cholesterol biosynthesis.37 These functions have been shown to have a big influence on tumour cell growth.38 Alterations of serum levels of apolipoproteins have previously been reported to be associated with breast, lung and colorectal cancers.39 In our material, most of the apolipoproteins participating in the networks were downregulated in the OPSCC serum compared to controls, except for apolipoprotein C-IV (APOC4) that was upregulated. This seems logical considering the increased uptake of apolipoproteins by tumour cells.

Two S-plot proteins, CFHR2 and MYL12A, upregulated in early OPSCC when compared with controls, were found in both PPI clusters and among the top six IPA networks. Out of the 42 downregulated S-plot proteins presented in the top six IPA networks, complement component C9 (C9), ficolin-3 (FCN3) and C4b-binding protein alpha chain (C4BPA) had the best p(corr), fold change and intensity values, and were also present in the PPI clusters (Fig. 5). In our opinion, together, these five proteins should be further studied as a potential future panel for early OPSCC diagnostics. Being all among S-plot proteins and present in both IPA and PPI networks, they had the best ability to identify cases from controls. CFHR2 is a complement factor found to regulate alternative complement pathway activation.40 MYL12A is a myosin regulatory subunit that regulates muscle cell contraction.25 This protein has been thought to potentially participate in DNA damage repair,41 and upregulation of MYL12A mRNA has been associated with non-small-cell lung carcinoma previously.42 C9 is a member of the membrane attack complex, participating in the final component of the complement system.41 FCN3 has a role in the activation of the complement pathway through the activation of the lectin pathway.41 Downregulation of C9, FCN3 and C4BPA mRNAs has previously been associated with liver cancer.43, 44 C4BPA, together with C4BPB, forms a multimeric protein participating in complement activation in the classical pathway.41 It is of note that, owing to very small abundances of C4BPB, there is little or no utility for this protein, as it will be hard to detect it reliably with classical clinical chemistry settings. However, C4BPA has all the characteristics of being clinically useful due to good abundance in serum samples, high confidence of identification, good fold change and statistical significance (Table 1 and Supplementary Table 2).

The ratio between upregulated and downregulated proteins and the networks in which these proteins were participating made us hypothesise that in the case of early-stage OPSCCs, the main reason for the change in serum proteome could be a tumour-specific response in the host system, not necessarily proteins originating from the actual tumour. When comparing our results with earlier serum proteomics studies on cancer patients, we discovered that 11 proteins out of the 152 quantified proteins in OPSCC serum were also expressed in the serum of patients with pancreatic cancer and 47 proteins were expressed in the serum of oral cavity squamous cell carcinoma (OSCC).45, 46 This finding indicates that changes in the levels of some serum proteins most likely reflect a general response to cancer, with still the largest part being specific to the disease. Even though the networks and functions of the proteins with altered expression levels in OPSCC were quite generalised to cancer, the protein combinations seem to be unique. Interestingly, the differences between OSCC and OPSCC, although smaller than in comparison to pancreatic cancer, were significant. Although cases in the current study represented early tumours, whereas tumours in the OSCC study were of all TNM stages,46 it is likely that this significant difference in the protein expression profiles is also due to tumour-specific changes in serum. In addition to these possible changes due to histological and anatomical differences between OPSCC and OSCC, another possible reason for the OSCC/OPSCC difference is the viral origin in half of the OPSCC tumours studied.8 The role of HPV in tongue cancers is not established.

When serum samples of patients with p16-positive and p16-negative tumours were compared with each other, 24 proteins were differently expressed in the two groups. S-plot proteins resulting from comparing each group with healthy controls were almost exclusively shared between the two groups, although the fold changes of the proteins’ expressions varied. IPA canonical pathways and networks and PPI network analyses were created separately for p16-negative and p16-positive early-stage OPSCCs versus control data. The majority of the interacting proteins were shared by both groups, as expected, as all the cases represent early-stage OPSCC. Some minor differences in protein interactions segregating the two groups were discovered. For example, a network haematological disease, haematological system development and function and organismal functions were only present in the IPA networks of the p16-positive group. All in all, based on serum proteomics, p16-positive and p16-negative early-stage OPSCCs seemed to be mostly similar, although some specific proteins, networks and PPIs were found.

These results strengthen the current knowledge of OPSCC being a disease with versatile altering events in protein expression levels, and further the knowledge in associating networks and interactions. Most probably, the changes seen in serum protein levels reflect the general host response, tumour-specific host response and leaking of tumour-specific proteins into the bloodstream. The expression levels of 96 S-plot proteins were able to reliably distinguish early-stage OPSCCs from healthy controls. Network and PPI analyses provided some additional information of the proteins, with the ability to filter out a smaller set of proteins—putatively representing a potential panel of biomarkers. This is important, as instead of seeking a single protein, the opportunity to form a panel of proteins with both upregulated and downregulated abundancies could serve as a more dependable composition for decision making in future diagnostics. We suggest that the panel of five serum proteins; CFHR2, MYL12A, C9, FCN3 and C4BPA, identified with these methods, might serve as a diagnostic biomarker for early-stage OPSCC.

To conclude, we have demonstrated how serum proteomics is capable of differentiating patients with early-stage OPSCC from healthy controls. This finding has a great potential to improve the early diagnostics of OPSCC. More importantly, the present study and our earlier work will allow us to further delineate differences between different head and neck cancers in terms of their characteristic serum-biomarker profiles. Further screening of the five above-mentioned proteins in a larger cohort of patients would be necessary to establish their value for clinical use.