Patients with early-stage oropharyngeal cancer can be identified with label-free serum proteomics

Tuhkuri, Anna; Saraswat, Mayank; Mäkitie, Antti; Mattila, Petri; Silén, Robert; Dickinson, Amy; Carpén, Timo; Tohmola, Tiialotta; Joenväärä, Sakari; Renkonen, Suvi

doi:10.1038/s41416-018-0162-2

Download PDF

Article
Open access
Published: 02 July 2018

Molecular Diagnostics

Patients with early-stage oropharyngeal cancer can be identified with label-free serum proteomics

Anna Tuhkuri¹^na1,
Mayank Saraswat^2,3^na1,
Antti Mäkitie^1,4,
Petri Mattila¹,
Robert Silén^2,3,
Amy Dickinson¹,
Timo Carpén¹,
Tiialotta Tohmola^2,5,
Sakari Joenväärä^2,3 &
…
Suvi Renkonen^1,6

British Journal of Cancer volume 119, pages 200–212 (2018)Cite this article

1715 Accesses
11 Citations
10 Altmetric
Metrics details

Subjects

Abstract

Background

The increasing incidence of oropharyngeal squamous cell carcinoma (OPSCC) is mainly related to human papillomavirus (HPV) infection. As OPSCCs are often diagnosed at an advanced stage, mortality and morbidity remain high. There are no diagnostic biomarkers for early detection of OPSCC.

Methods

Serum from 25 patients with stage I–II OPSCC, and 12 healthy controls, was studied with quantitative label-free proteomics using ultra-definition MS^E. Statistical analyses were performed to identify the proteins most reliably distinguishing early-stage OPSCCs from controls. P16 was used as a surrogate marker for HPV. P16-positive and P16-negative tumours were analysed separately.

Results

With two or more unique proteins per identification, 176 proteins were quantified. A clear separation between patients with early-stage tumours and controls was seen in principal component analysis. Latent structures discriminant analysis identified 96 proteins, most reliably differentiating OPSCC patients from controls, with 13 upregulated and 83 downregulated proteins in study cases. The set of proteins was studied further with network, pathway and protein–protein interaction analyses, and found to participate in lipid metabolism, for example.

Conclusions

We found a set of serum proteins distinguishing early-stage OPSCC from healthy individuals, and suggest a protein set for further evaluation as a diagnostic biomarker panel for OPSCC.

Possible proteomic biomarkers for the detection of pancreatic cancer in oral fluids

Article Open access 15 December 2020

O. Deutsch, Y. Haviv, … A. Palmon

Proteomic analysis of hypopharyngeal and laryngeal squamous cell carcinoma sheds light on differences in survival

Article Open access 10 November 2020

Jiajia Liu, Weiming Zhu, … Gangcai Zhu

Connecting multiple microenvironment proteomes uncovers the biology in head and neck cancer

Article Open access 07 November 2022

Ariane F. Busso-Lopes, Leandro X. Neves, … Adriana F. Paes Leme

Introduction

The worldwide annual incidence of head and neck cancers is almost 700,000, and 380,000 patients succumb to their disease annually.¹ Oropharyngeal squamous cell carcinoma (OPSCC) accounts for ~20% of all new head and neck cancers, and the incidence is expected to rise over the following decades.^1,2,3 This increase is mainly due to the cancers related to the human papillomavirus (HPV), and particularly due to its high-risk genotype HPV-16.^{2, 4}

Traditionally, the main risk factors for OPSCC have been smoking and heavy alcohol consumption.⁵ Patients diagnosed with HPV-related OPSCC tend to be younger, and the consumption of alcohol and tobacco is often lower or even absent.⁶ HPV-related tumours have a better prognosis, a lower risk of secondary malignancies and the disease responds better to (chemo)radiotherapy.^{5, 7} It is also of note that HPV-associated OPSCCs in tobacco users behave like classical tobacco-associated OPSCCs.⁸ While the de-escalation of HPV-positive OPSCC patients’ treatment is under investigation,⁷ patients with HPV-negative OPSCC still require heavy treatment and the prognosis remains poor.⁶ At the moment, the only way to improve the prognosis of patients with HPV-negative tumours would be to diagnose them earlier.

Currently, there are no diagnostic biomarkers for OPSCC to enhance its detection at an earlier stage. Brush samples, used successfully for cervical cancer screening, have been shown to be ineffective in screening HPV-positive OPSCCs, and no diagnostic biomarkers from standard bio-fluids exist.⁹ HPV vaccinations could eventually decrease the epidemic of HPV-related OPSCC; however, even if effective vaccination programmes were launched, the decrease in incidence would only be seen after a couple of decades.¹⁰

Protein expression levels in both tumour tissue and serum samples of patients with OPSCC have been studied, showing some alterations, compared with those of healthy controls.^{11,12,13,14,15,16} However, these studies have often been targeted to recognised proteins, based on earlier studies on other cancers. Discovery-driven mass spectrometry proteomics offers the possibility to discover novel biomarkers and pathways, as well as to associate the findings with clinical aspects.

Our objective was to compare the serum protein profiles of patients with early-stage OPSCC and of healthy controls, to promote early cancer diagnostics. For early-stage tumours, we chose stage I and stage lI tumours (eighth edition of TNM classification of malignant tumours, 2016). Protein p16, i.e. cyclin-dependent kinase inhibitor 2A, is used as a surrogate marker for HPV status at our department and also in this study. The protein was first presented for OPSCC by Klussmann et al. and is now an established immunohistological marker, widely used instead of the arduous and expensive HPV detection and typing.¹⁷ We analysed the serum samples in ultra-definition MS^E (UDMS^E) mode. Of three data-independent data-acquisition methods available in the Synapt G2-S (MSE, high-definition MSE (HDMSE) and UDMSE), the last one was chosen as it gives the best protein coverage on the sample.¹⁸ Based on the proteomic changes revealed, we aimed to find a set of proteins that are possibly usable as a biomarker panel for early-stage OPSCC.

Materials and methods

Patients and serum samples

Serum samples from 25 patients diagnosed with stage I–II OPSCC were collected prior to treatment between the years 2012 and 2015 at the Department of Otorhinolaryngology—Head and Neck Surgery, Helsinki University Hospital, Helsinki, Finland. After collection, the samples were allowed to clot at room temperature (RT) before they were centrifuged at 4 °C (1000 × g) to separate serum. Sera were stored at –70 °C until all were assayed at the same time. The inclusion strategy by the TNM status was based on the eighth edition of TNM classification of malignant tumours, dividing HPV-positive and HPV-negative OPSCCs as separate entities,⁸ and protein p16 status was used as a surrogate marker for HPV. Twelve serum samples from age-matched and gender-matched control patients were received from the Finnish Red Cross Blood Service.

Written informed consent was obtained from all patients. The study plan was approved by the institutional Research Ethics Board at the Helsinki University Hospital (DNr. 51/13/03/02/2013).

Reagents

Reagents for serum pre-processing, Pierce Swell Gel Blue Albumin Removal Discs, Pierce Centrifuge columns and Pierce C18 Spin Columns, were acquired from Thermo Scientific (Rockform, IL, USA), solvents and high-purity HPLC reagents from Waters (Milford, MA, USA) and other reagents from Sigma-Aldrich (St Louis, MO, USA).

Serum treatment and protein digestion

The workflow has been described previously in detail.¹⁹ In brief, the samples were thawed, and after the depletion of the top 12 proteins with Pierce Top 12 protein depletion columns, the protein concentration was measured by a bicinchonic acid assay kit (Pierce, Thermo Scientific, Rockform, IL, USA) for the total protein concentration. Top 12 protein-depleted serum samples corresponding to 350 µg of total protein were dried in a speed vacuum (Savant, Thermofisher), and then dissolved in 6 M urea and 100 mM Tris-HCl (pH 7.4). Reduction of disulphide bonds was performed with 10 mM of dithiothreitol (DTT) for 60 min at RT, and thereafter 30 mM iodoacetamide was used for alkylating the proteins for 60 min in the dark at RT. Protein digestion was performed with trypsin (Promega, Madison, WI) for 18 h at +37 °C after the consumption of excess iodoacetamide by adding DTT again (30 mM DTT, 60 min at RT). Samples were diluted 1:10 with high-purity Milli-Q water (Millipore, Billerica, MA, USA) before addition of trypsin. Finally, the samples were purified in C18 spin columns, and dried in a speed vacuum and dissolved in 0.1% formic acid containing 12.5 fmol Hi3 peptide mixture (Waters) per µl. All of the procedures described were performed according to the manufacturer’s instructions, wherever applicable.

Liquid chromatography—ultra-definition MS^E

Four-microlitre samples corresponding to 1.4 µg of total protein were injected to the ultra-performance liquid chromatography (UPLC) system (Waters Corporation, Billerica, MA, USA).¹⁸ TRIZAIC nanoTile 88-µm × 100-mm HSS-T3u wTRAP was applied as a separating device before mass spectrometry (MS). After loading and trapping, the samples were washed for 2 min at 8.0 µl/min with 1% buffer B. The analytical gradient was used as follows: 0–1 min 1% B; at 2 min 5% B; at 65 min 30% B; at 78 min 50% B; at 80 min 85% B; at 83 min 85% B; at 84 min 1% B and at 90 min 1% B with 450 nl/min. Buffer A consists of 0.1% formic acid in water and buffer B consists of 0.1% formic acid in acetonitrile (Sigma-Aldrich).

The data were acquired with UDMS^E with Synapt G2-S UDMS (Waters Corporation) including ion mobility spectroscopy (IMS). The data range was 100–2000m/z, scan time 1 s, IMS wave velocity 650 ms⁻¹ and collision energy ramped in trap between 20 and 60 V. Calibration was performed by Glu1-fibrinopeptide B MS2 fragments and Glu1-fibrinopeptide B precursor ion, used during the acquisitions as a lock mass. In total, 10% of the samples were acquired as triplicates to validate the results, and further analysis was conducted with Progenesis QI for Proteomics software (Nonlinear Dynamics, Newcastle, UK) (Supplement S2—triplets).

The mass spectrometry proteomics data have been deposited into the ProteomeXchange Consortium via the PRIDE partner repository with the data set identifier PXD008445.²⁰

Data analysis

The data analysis was described previously in detail.²¹ Briefly, Progenesis QI for proteomics software (Version 3, Nonlinear Dynamics) was used for processing raw files. Peptide identification was run with Uniprot human FASTA sequences (UniprotKB Release 2015_09, 20205 sequence entries), and label-free protein quantification was performed with the Hi-N method (Protein Lynx Global Server).²² The samples were spiked with 12.5 fmol/µl of CLPB_ECOLI (P63285, ClpB protein) peptides (Hi3 Escherichia Coli Standard, Waters).

The peptide identification parameters were fixed modification of cysteine (carbamidomethyl) and variable modification of methionine (oxidation). The peptide error tolerance was set to a maximum of 10 ppm, the false-discovery rate was limited to less than 2% and default values (in Progenesis QI for Proteomics) were used for the rest of the parameters.

The quantified proteins in all comparisons were compared by ANOVA on a protein-by-protein basis and their expression levels were considered significantly different if the ANOVA p value was <0.05. Principal component analysis (PCA), offering the visualisation of the main axes of variation in the data groups, was performed by Progenesis QI for proteomics. Processing the Progenesis QI data with EZinfo 3.0 software (a statistical tool released in December 2014, Umetrics, Sweden), supervised OPLS–DA modelling was performed. With a p(corr) cut-off of ± 0.80, a variance versus correlation plot (S-plot) and a list of S-plot proteins was generated from OPLS–DA data.

Protein–protein interactions, pathways and networks

STRING 10.5 database illustrates known and predicted protein–protein interactions (PPI),²³ and was used for PPI analyses, giving a sophisticated view of possible and known interactions between proteins. PPI analyses were conducted to filter the S-plot proteins and project them to connected pathways and/or co-expression. Medium stringency was used for inferring the networks from protein lists on the STRING DB and textmining was excluded as a setting.

The network and canonical pathway overrepresentation analyses were conducted through the use of Ingenuity pathway analysis (IPA; QIAGEN Inc., https://www.qiagenbioinformatics.com/products/ingenuity-pathway-analysis) with default parameters to identify which networks and pathways were most enriched in our protein list.²⁴ IPA networks differ from PPIs in their way of connecting proteins. In addition to the proteins actually present, they combine the information about possible connector proteins (not present in the user-supplied list). This allows another way of finding the networks the proteins are enriched into. IPA analyses were conducted on the proteins with the ANOVA p value < 0.05 and S-plot proteins were then separately matched to the proteins in enriched networks.

Results

Metadata and workflow

Twenty-five serum samples from patients with stage I and stage II OPSCC, together with 12 samples from healthy controls were studied. Of the 25 patients with stage I–II tumours, 12 had p16-positive and 13 had p16-negative tumours.

The tumour localisation was tonsil in 15 (60%) of the 25 patients, base of the tongue in 8 (32%), the soft palate in 8 (32%) and posterior wall of the oropharynx in 1 (4%). Sixty percent of the patients were male and 40% were female. The age of the patients varied from 36 to 78 years with the median age being 60.85 (average 60.92). More detailed clinical parameters are provided in Supplementary Table 1. The data analysis workflow is presented in Fig. 1.

All early-stage OPSCCs versus controls

Protein identification and PCA

With the criterion of two or more unique peptides per protein identification, 176 proteins were quantified from all cases and controls were analysed. The identified proteins were compared by ANOVA on a protein-to-protein basis. With the ANOVA cut-off of 0.05, 152 proteins with two or more unique peptides were quantified (Supplementary Table 2). Based on serum protein expression levels of patients with early-stage OPSCC and healthy controls, the two groups were found to be separated in PCA (Fig. 2).

OPLS–DA

As another group classification method, OPLS–DA modelling was performed, and an S-plot was generated, presenting 96 proteins that most reliably distinguished patients from controls (Fig. 3). These proteins passed the p(corr) cut-off of ± 0.80 and were thus considered significantly different (Table 1). Of the 96 proteins, 13 were expressed in higher levels in early-stage OPSCCs when compared to controls, and the remaining 83 proteins had lower levels in cases compared with controls.

Table 1 S-plot proteins obtained from OPLS–DA regression analysis (p(corr) ± 0.80), of all stage I–II OPSCCs versus controls

Full size table

Protein–protein interactions

To further study our set of S-plot proteins and to try to identify the most relevant proteins, protein–protein interaction (PPI) webs were created using the STRING 10.5 database. Proteins with the most interactions, with connections to other proteins ranging from 9 to 16, were prothrombin (F2), plasminogen (PLG), alpha-2-antiplasmin (SERPINF2), histidine-rich glycoprotein (HRG), beta-2-glycoprotein 1 (APOH), carboxypeptidase B2 (CPB2), inter-alpha-trypsin inhibitor heavy chain H4 (ITIH4) and complement C2, C5, C4-A and C4-B (C2, C5, C4A and C4B).

According to the UNIPROT database,²⁵ these proteins seemed to be associated with complement activation (early and late), extracellular matrix remodelling and lipid metabolism, for example. PPIs of the S-plot proteins are shown in Fig. 4.

Pathways and networks

The top six IPA networks where the identified proteins were most enriched were 1. developmental disorder, hereditary disorder and immunological disease; 2. lipid metabolism, molecular transport and small-molecule biochemistry; 3. humoral immune response, inflammatory response, haematological system development and function; 4. cardiovascular disease, organismal injury and abnormalities and tissue morphology; 5. hereditary disorder, ophthalmic disease, organismal injury and abnormalities and 6. cell morphology, cellular development, cellular assembly and organisation. The score of the top six IPA networks ranged from 21 to 45. There were 13–23 proteins with the ANOVA p value < 0.05 participating in each of the networks and the total amount of focus molecules was 108. Of these, 46 were S-plot proteins (p(corr) ± 0.80). The network linked with lipid metabolism, containing 14 S-plot proteins, is illustrated in Fig. 5. The other five networks are presented in Supplementary Table 3. Altogether, among the S-plot proteins present in the top six IPA networks, four were upregulated in cases versus controls: complement factor H-related protein 2 (CFHR2), GREB1-like protein (GREB1L), myosin regulatory light chain 12A (MYL12A) and myotonin-protein kinase (DMPK). CFHR2 and MYL12A were also found to be binding in the PPI clusters. The remaining 42 S-plot proteins presented in the top three IPA networks were downregulated in cases versus controls, and the majority of these were also present in the PPI clusters.

In the canonical pathway analyses conducted with IPA, acute phase response signalling, LXR/RXR activation, FXR/RXR activation and the complement system were among the highest enriched pathways. The top canonical pathways are shown in Supplementary Figure 1.

Comparison between P16-negative and P16-positive tumours