Possible proteomic biomarkers for the detection of pancreatic cancer in oral fluids

The 80% mortality rate of pancreatic-cancer (PC) makes early diagnosis a challenge. Oral fluids (OF) may be considered the ultimate body fluid for non-invasive examinations. We have developed techniques to improve visualization of minor OF proteins thereby overcoming major barriers to using OF as a diagnostic fluid. The aim of this study was to establish a short discriminative panel of OF biomarkers for the detection of PC. Unstimulated OF were collected from PC patients and controls (n = 30). High-abundance-proteins were depleted and the remaining proteins were analyzed by two-dimensional-gel-electrophoresis and quantitative dimethylation-liquid-chromatography-tandem mass-spectrometry. Label-free quantitative-mass-spectrometry analysis (qMS) was performed on 20 individual samples (n = 20). More than 100 biomarker candidates were identified in OF samples, and 21 had a highly differential expression profile. qMS analysis yielded a ROC-plot AUC value of 0.91 with 90.0% sensitivity and specificity for a combination of five biomarker candidates. We found a combination of five biomarkers for PC. Most of these proteins are known to be related to PC or other gastric cancers, but have never been detected in OF. This study demonstrates the importance of novel OF depletion methodologies for increased protein visibility and highlights the clinical applicability of OF as a diagnostic fluid.

OF collection, patients and healthy volunteers. Unstimulated OF flow was collected for 5 min using the spitting method 18 into pre-calibrated tubes. All participants refrained from eating, drinking and brushing their teeth 1 h prior to saliva collection. Patients did not take their medications, including sialagogues, before saliva collection.
Volunteers rested for 10 min before saliva collection, sitting in an upright position and in a quiet room and were asked not to speak or leave the room until after the saliva was collected. Saliva samples were immediately placed on ice and then centrifuged at 14,000 g for 20 min at 4 °C to remove insoluble materials, cell debris and food remnants. The supernatant of each sample was collected and protein concentration was determined using the Bio-Rad Bradford protein assay (Bio-Rad, Hercules, CA, USA) as previously described 19 .
OF were collected from 31 males; 15 PC patients and 16 healthy, age matched controls. Controls did not take any medications known to cause xerostomia (supplementary data A), had no complaints of oral dryness and no evidence oral mucosal diseases was detected following examination. 2 patients in the PC group were undergoing chemotherapy at the time of collection and were therefore excluded from the OF pool. Salivary flow rate was calculated. OF samples were divided into two groups: (1) for to 2DE and Demethylation MS analysis (described below), samples were pooled according to the amount of total protein in each individual sample. 2) For label-free qMS, individual samples were used. sAA affinity removal. Amylase was removed from the pooled OF using an amylase removing device. 600 µL of water was hand pressed (20 s) through the device to moisturize the substrate. Thereafter, 1 mL of pooled OF (in two aliquots of 500 µL) was hand pressed and filtered (120 s) through the amylase removing device. The resultant 1 mL of filtrated OF was amylase-free, as previously described 14 . Alb and IgGs removal, capturing and elution. In order to remove alb and IgGs the ProteoPrep Immunoaffinity alb and IgG Depletion Kit (Sigma-Aldrich, St Louis, MO, USA) were used as previously described 15 Protein concentration was measured again as before, using the Bio-Rad Bradford protein assay (Bio-Rad, Hercules, CA, USA) 19 .
The triple depleted OF were divided to 2 tubes for 2DE and quantitative MS analysis and frozen at − 80 °C and lyophilized overnight. Sediments (products (deposits) of lyophilization processes) for 2DE were dissolved in 7M urea, 2M thiourea and 4% 3-[(3-cholamidopropyl) dimethylammonio]-1-propane-sulfonate (CHAPS) and stored at − 20 °C until analysis. Two-dimensional sodium dodecyl sulfate polyacrylamide gel electrophoresis (2DE). For analytical gels, 100 µg of protein were rehydrated then subjected to isoelectrofocusing in 18 cm long second dimen-Scientific Reports | (2020) 10:21995 | https://doi.org/10.1038/s41598-020-78922-x www.nature.com/scientificreports/ sion gels, pH 3-10 NL as previously described 20 . To prepare the gel strips for separation in the second dimension they were soaked twice for 15 min in an SDS-PAGE equilibration buffer as previously described 14 . For the second dimension, strips were embedded in 0.5% w/v agarose containing a trace of bromophenol blue and loaded onto hinged spacer plates (20 cm × 20.5 cm; Bio-Rad, Hercules, CA, USA) using 9.5-16.5% SDS polyacrylamide gradient gel electrophoresis. The same running and staining apparatus at a constant current of 30 mA per gel at 10 °C was used for all samples. Gels were silver stained with SilverQuest kit (Invitrogen, Carlsbad, CA, USA).
Imaging and statistical analysis. Gels were scanned using a computer GS-800 calibrated densitometer (Bio-Rad, Hercules, CA, USA) and spots were detected and quantified using PDQuest software V 6.2.0 (Bio-Rad, Hercules, CA, USA). In order to overcome several of the known limitations of 2D gel analysis that occur as a result of gel to gel variation, and also variability in staining 14 , all samples were run simultaneously for the first and second dimensions. Normalization with PDQuest was performed using the total density in image method to semi-quantify spot intensities and to minimize staining variation between gels 14 .
2DE Mass-spectrometry (MS) identification. For 16.5M) to the pooled PC sample to a final concentration of 200 mM. Following 1 h of incubation at room temperature the pH was raised to 8 and the reaction was incubated for another hour at room temperature. Neutralization was done with 25 mM ammonium bicarbonate for 30 min, and equal amounts of the light and heavy peptides were mixed, cleaned on a C18 stage tip, dried and re-suspended in 0.1% formic acid. Peptides were resolved by reverse-phase chromatography on 0.075 × 200-mm fused silica capillaries (J&W) packed with Reprosil reverse phase material (Dr. Maisch GmbH, Germany). The peptides were eluted with linear 215 min gradients of 7 to 40% and then for 8 min at 95% acetonitrile with 0.1% formic acid in water at flow rates of 0.25 μl/min. Mass spectrometry was performed using an ion-trap mass spectrometer (Orbitrap, Thermo) in a positive mode using a repetitively full MS scan followed by collision induced dissociation (CID) of the 7 most dominant ions selected from the first MS scan.
The MS data was analyzed using Sequest 3.31 software (J. Eng and J. Yates, University of Washington and Finnegan, San Jose) searching the human part of the NCBI-NR database. Quantitation was performed using the PepQuant algorithm of Bioworks and "in house" software.
Label free MS analysis. 20 individual samples (from 10 PC patients and 10 healthy volunteers) were analyzed using Label free analysis following the depletion of high abundance proteins. The tryptic peptides were desalted using C18 tips, dried and re-suspended in 0.1% formic acid. The peptides were resolved by reverse-phase chromatography on 0.075 × 200-mm fused silica capillaries (J&W) packed with Reprosil reversed phase material (Dr Maisch GmbH, Germany). The peptides were eluted as described above. A wash run and one blank injection were performed between the samples to make sure there was no cross contamination 7 .
The MS data was analyzed using MaxQuant 1.2.2.5 software (Mathias Mann's group) searching against the human section of the Uniprot database and quantified by label free analysis using the same software. Statistical analysis was done using Perseus software (Mathias Mann's group).

Scientific Reports
| (2020) 10:21995 | https://doi.org/10.1038/s41598-020-78922-x www.nature.com/scientificreports/ Bio-statistical analysis. Dr. Yoav Smith (Head of the Genomic Data Analysis Unit, The Hebrew University, Jerusalem) was our consultant for the analysis. Briefly, label-free qMS results were initially analyzed utilizing Matlab software R2013a (The MathWorks, Inc. USA). Data was then presented in a Volcano plot using the vertical axis for the p-values and the horizontal axis for the log 2 ratio values. By using a threshold of less than 0.05 for the p-values, and a fold change of + or − 2 for the absolute log 2 ratios, proteins with the largest statistically significant expression change were chosen. Furthermore, for the combined protein group the predicted probability for each subject was obtained and was used to construct receiver operating characteristic (ROC) curves. The standard error of the area under the curve (AUC) value and the 95% confidence interval (CI) for the ROC curve were computed as previously described 22 . The sensitivity and specificity for the combined biomarkers were estimated by identifying the cutoff-point of the predicted probability that yielded the highest sum of sensitivity and specificity.

Results
The mean age of the 15 PC patients was 65.7 ± 13.24 years, and the mean age of the 16 healthy age-matched controls was 56.5 ± 3.3 years. The average time from PC diagnosis to OF collection was ~ 7 months. 72% of the patients were diagnosed with stage IV and the rest with stage III. All the PC patients took medications regularly, and their tendency to cause xerostomia was checked (supplementary data A), only 2 patients used medicines known to cause dry mouth in more than 10% of individuals.
The study was divided into sections: 1. Proteomic analysis on pooled samples using 2DE and dimethylation-qMS. 2. Analysis of individual samples using label-free qMS.

Dimethylation MS analysis of pooled PC and control samples. Dimethylation followed by LC-
MS/MS of PC and control OF samples exposed 182 proteins (supplementary data B). 21 proteins showed an extended differential profile with a 3 to 50-fold change in expression. 37 proteins had a 2 to threefold expression change (see Table 1 for details). Table 1A refers to publications implicating 19 of our 21 identified proteins as biomarker candidates for PC or other cancers. None of these proteins has ever been detected in OF of PC patients. Label free qMS on individual samples. This extensive examination led to the identification of 480 proteins. MS results show the relative expression profile of the proteins in each sample. An average expression ratio was calculated for each protein. 71 proteins were down regulated by more than twofold in PC samples, among them 34 by more than threefold. 92 proteins were up regulated by more than twofold, out of them 46 by more than threefold. The subsequent statistical analysis (t test, p value < 0.05), showed 39 proteins with an average change in expression profile of more than twofold. The proteins were grouped according to the number of subjects in which they were found; less than 6 subjects and more than 6 subjects. For example, S100-A9 was found in OF samples of all subjects, and decreased significantly (p < 0.05) by more than threefold in PC patients [ Table 2, Fig. 2A].
Interestingly, Zinc-alpha-2-glycoprotein (P25311), showed an increased expression profile in the individual qMS whereas the in the results of the qMS of pooled samples it showed an opposite trend. Another controversial protein was Lipocalin-1 (P31025) in which the individual MS supported the results of the 2DE showing an average increase of more than 3.5-fold in PC patients, but the changes in the MS were not statistically significant.
Bio-statistical analysis. In order to determine a short panel of discriminative biomarkers, label free qMS results were bio-statistical analyzed utilizing Matlab software R2013a (The MathWorks, Inc, USA). Data was presented in a Volcano plot using the vertical axis (Fig. 2B).
To  (Fig. 2C). In other words, 18 out of 20 OF samples showed true positive or true negative results, based on the combined biomarker examination.

Discussion
Pancreatic cancer (PC) is an aggressive cancer and ranks third in cancer mortality in Israel and 8th worldwide 2,23,24 . Most PC are diagnosed at a late stage demonstrating the need to establish a simpler, non-invasive, cost effective screening tool for PC such as oral fluids (OF).   Table 1A summarizes 19 proteins out of 21 with more than threefold changes in expression that were considered as potential biomarkers, details of seven of these proteins are presented below: i. Histones (P62805, P33778, Q96A08) are strongly alkaline proteins which package and organize the DNA into structural units called nucleosomes. Autoantibodies to this protein found in the serum of PC patients have been suggested as potential biomarkers 25,26 . Table 2. Individual sample analysis (n = 20) by label free qMS. A. Proteins identified in at least 6 control and PC subjects with an average differential expression (P < 0.05). B. Proteins with an average differential expression (P < 0.05), with no minimum number of subjects. www.nature.com/scientificreports/ ii. Apolipoprotein A-I precursor has a specific role in lipid metabolism. It is the major component of highdensity lipoprotein in plasma and has recently been patented for early diagnosis, screening, therapeutic follow-up and prognosis, as well as diagnosis of relapse of colorectal cancer 27 . iii. Myeloperoxidase is an important factor influencing oxygen dependent mechanisms of pathogen destruction. A significant decrease in the activity of myeloperoxidase has been found in the neutrophils of PC patients 28 . iv. Transthyretin precursor is a serum and cerebrospinal fluid carrier of the thyroid hormone thyroxine (T4) and retinol. Its expression was significantly lower (7.9-fold) in the serum of PC patients 29 . v. Lipocalin-1 and Protein S100-A8 were down regulated in PC versus non-neoplastic ductal cells by stable isotope labeling with amino acids in cell culture 30 . www.nature.com/scientificreports/ vi. Transketolase is up regulated in PC cells compared to healthy pancreatic ducts (3.66-fold increase compared to the 3.18-fold increase we found in OF) 31 . vii. Hemopexin is the highest affinity heme binding protein, protecting the body from the oxidative damage that free heme can cause. This protein has been consistently associated with tumors 30 .
Partial overlap between the two-proteomic screening approaches; 2DE and dimethylation qMS demonstrated the importance of employing different proteomic strategies to maximize identification abilities. The disadvantages of 2DE as a proteomic method including: spots containing more than one protein; limited dynamic range imposed by the gel method; difficulty with hydrophobic proteins; inability to detect proteins with extreme molecular weights and pI values, have been previously described 30 . In order to overcome these limitations, multiple detection methods were used. Furthermore, when a discrepancy was noted between the methods, the label-free qMS on individual samples supported the results of the 2DE upon dimethylation qMS. Nevertheless, the need for extensive individual proteomic analyses and validation is clear.

Bioinformatic analysis.
Up and down regulated biomarker candidates were analyzed and clustered according to their molecular and biological functions using David-Kegg Bioinformatics Resources 32 . The expression of 32 proteins increased and 65 had lower levels (> twofold change). The main functional and molecular groups included; signal peptides, glycosylation processes and protease activity (Fig. 3A). These finding are in accordance with extensive bioinformatic analysis of PC biomarker candidates from tumor tissue or patient serum samples 33 . Further analysis utilizing "String" bioinformatics website (http://strin g-db.org/) to explore protein-protein interaction strength revealed four clustered functional groups, including; tissue homeostasis, regulation of biological quality, peptidase regulation activity and extra cellular exosome (Fig. 3B).
In this study 25 out of 32 candidate biomarkers were exosomal proteins. This, most interestingly, is in full agreement with a study by Lau et al. discussing the role of tumor-derived exosomes in OF biomarker development 34 . The authors, however, focused on the influence of pancreatic exosomes on OF biomarker development, while the role of the exosomes in the targeted organs remained ambiguous. A partial explanation may be that exosomes not only transport messenger molecules from the pancreas to the salivary glands, but also deliver biomarkers to OF. Whether these are the original pancreatic exosomes or newly secreted vesicles from the salivary glands, should be examined further.
Similarly, an in vitro examination showed that breast cancer derived exosomes interact with the salivary glands and alter the composition of salivary gland cell-derived exosome-like macrovesicles in the transcriptome and proteome 35 .
Because a solitary biomarker is unlikely to detect a particular cancer with high specificity and sensitivity, we evaluated combinations of the identified biomarkers using an ROC analysis. We calculated high ROC AUC values indicating that the predictive utility increased substantially, enabling the identification of a group of five biomarker candidates. Three Cytokeratin types (14, 16 and 17), involved in the regulation of cellular properties and functions, including apico-basal polarization, motility, cell size, protein synthesis and membrane traffic and signaling were selected. In many cases, their presence or absence has prognostic significance for cancer patients 36 . The role of cytokeratins in pancreatic cancer and the ability to utilize them as biomarkers is widely discussed in the literature 37,38 . For example Keratin 17 was proven to be a novel negative prognostic biomarker for pancreatic cancer 39 .
The remaining two proteins with elevated levels in OF of PC patients and included in our biomarker combination were Lactoperoxidase and Peptidyl-prolyl cis-trans isomerase B. The latter is also called Cyclophilin B (CypB) and is a 21-kDa protein belonging to the cyclophilin family of peptidyl-prolyl cis-trans isomerase. It promotes alterations in protein conformation and influences cell growth, proliferation, and motility 40 .
Enhanced expression of CypB in malignant breast epithelium may contribute to the pathogenesis of the disease 41 . Moreover, elevated levels of CypB have been found in sera of PC patients and this protein has been suggested as a serum biomarker for PC 42 .
The comparison of pooled sample results to individual qMS analysis showed partial overlap. Approximately 33% of the proteins with the highest expression fold change and lowest p-value identified in the individual samples presented similar expression trends in pooled samples.
Furthermore, CypB, one of the five discriminative biomarkers found in the individual qMS analysis, was related to the down regulation of two S100 proteins. Both the pooled and individual qMS analysis showed decreased expression levels in these proteins. It was previously claimed that pooling serum samples may cause a ~ 50% loss of potential biomarkers 43 . The results of the current study support this argument; yet also show the advantages of the pooling strategy as an initial step before performing extensive examinations on individual samples. Pooled sample analysis enabled a relatively low-cost and rapid "proof of concept" examination. Clearly, validation using individual samples is required to understand the diagnostic potential of the biomarker combination.

Concluding remarks
Enhanced proteomic characterization of the oral fluids of PC patients revealed a profile of differentially expressed proteins. Bioinformatic analysis of OF was in accordance with previous studies of proteins expressed in PC in tissues, pancreatic juice or serum. Moreover, an extensive label free qMS analysis revealed a group of proteins, which may be used as a highly specific, and sensitive OF based test for PC test. A larger study is required for A. Exploring the accuracy of the combined 5 biomarkers that were found in this study, utilizing different proteomic technology (e.g. Elisa, Western blot, lateral flow immunoassay etc.).
B. validation and identifying high-risk groups in order to enable an early diagnosis, screening, therapeutic follow-up and prognosis and diagnosis of relapse in relation to PC using OF.