Serum biomarkers identification by iTRAQ and verification by MRM: S100A8/S100A9 levels predict tumor-stroma involvement and prognosis in Glioblastoma

Despite advances in biology and treatment modalities, the prognosis of glioblastoma (GBM) remains poor. Serum reflects disease macroenvironment and thus provides a less invasive means to diagnose and monitor a diseased condition. By employing 4-plex iTRAQ methodology, we identified 40 proteins with differential abundance in GBM sera. The high abundance of serum S100A8/S100A9 was verified by multiple reaction monitoring (MRM). ELISA and MRM-based quantitation showed a significant positive correlation. Further, an integrated investigation using stromal, tumor purity and cell type scores demonstrated an enrichment of myeloid cell lineage in the GBM tumor microenvironment. Transcript levels of S100A8/S100A9 were found to be independent poor prognostic indicators in GBM. Medium levels of pre-operative and three-month post-operative follow-up serum S100A8 levels predicted poor prognosis in GBM patients who lived beyond median survival. In vitro experiments showed that recombinant S100A8/S100A9 proteins promoted integrin signalling dependent glioma cell migration and invasion up to a threshold level of concentrations. Thus, we have discovered GBM serum marker by iTRAQ and verified by MRM. We also demonstrate interplay between tumor micro and macroenvironment and identified S100A8 as a potential marker with diagnostic and prognostic value in GBM.


Tumor/Serum sample collection and patient cohort characteristics:
Our prospective study included a total number of 154 patients who underwent surgical treatment noted as the duration between surgery and death of the patient due to the disease. Of these 154 patients, serum samples were available for a subset of patients, which were considered for further analysis in our study.
For serum preparation, blood samples were allowed to clot at 4 °C overnight followed by collection of upper phase as serum after centrifugation for 5 min at 1000 rpm to separate serum at 4 °C and were stored at -80°C until used.

Sample preparation, iTRAQ labelling and Off-gel fractionation
A pool of 10 serum samples from GBM patient (GBM pooled sera) and 10 serum samples from healthy individual (control pooled sera), all age and gender matched (average control age=49.5 years, average GBM age =53.1 years, p=0.11, see below), was examined for differential expression of proteins. In order to remove high abundant proteins (HAP) from sera, which could mask the detection of low abundant proteins (LAP), both, control pooled sera and GBM pooled sera, separately, were subjected to depletion by affinity chromatography using MARS HU-14 column (4.6×100 mm; Agilent Technologies, Santa Clara, CA, USA). The terminology LAP refers to serum depleted of only fourteen high abundant proteins against which antibodies are present in the column. Flow-through fraction was collected as LAP (Supplementary Figure   1A). Buffer exchange and desalting was performed for LAP using 500mM TEAB buffer (Sigma) and 3kDa cut off centricon (Millipore) as per the manufacturer's instruction. Silver staining was performed to visualize the depletion of HAP (Supplementary Figure 1B).
To perform iTRAQ labelling, equal amount of LAP protein extract from GBM and control samples was separately subjected to tryptic digestion as per manufactures protocol in duplicates (Applied Biosystems iTRAQ Reagent Four-Plex Kit protocol). Briefly, 50 µg of LAP from control pooled sera and GBM pooled sera were reduced alkylated and tryptic digested for 18 hours at 37°C. iTRAQ reagents 114 and 115 were used to label two control duplicate samples and iTRAQ reagents 116 and 117 were used to label two GBM duplicate samples.
iTRAQ labelled peptides from all four samples were mixed together, vacuum dried and were subjected to isoelectric point (PI) based fractionation using 3100 OFFGEL Fractionator kit (Agilent Technologies, Böblingen, Germany) with a setup of 24-well and an IPG strip of 24 cm, 3-10 linear pH range using manufacturer's protocol. Before electrofocusing, iTRAQ mixed sample was desalted using C18 spin columns (89873, Pierce). 24 fractions obtained after off-gel fractionation were pooled into 12 fractions. The pooling of fractions was done such that all fractions should be having approximately equivalent representation of number of peptides. To aid to this decision of fraction pooling, theoretical digestion of 1929 proteins reported in human plasma proteome 1 was performed and the theoretical PI of the obtained peptides was calculated using EXPASY tool (http://web.expasy.org/peptide_mass/). Thus, all the peptides were binned into 24 fractions where they are expected to be present theoretically as per their PI (Supplementary Figure 2A). After doing this exercise, fractions with lesser peptides were pooled together, and fractions with higher number of peptides were left as single fraction (Supplementary Figure 2B). This exercise helped us to bring down the number of samples to be  Sample  Gender  Age  1  GBM  M  46  2  GBM  M  60  3  GBM  M  51  4  GBM  M  54  5  GBM  M  55  6  GBM  F  55  7  GBM  F  60  8  GBM  F  50  9  GBM  F  55  10  GBM  F  45  11  Normal  M  48  12  Normal  M  57.5  13  Normal  M  50  14  Normal  M  41  15  Normal  M  51  16  Normal  F  50  17  Normal  F  54  18  Normal  F  48  19  Normal  F  45  20 Normal F 51

LC-MS/MS
After off-gel fractionation and pooling, samples were desalted using C18 spin columns,

Protein identification and quantification
For protein identification and quantitation, individual and merged analysis with raw files of 12 fractions was performed using Proteome Discoverer version 1.4.0.288 (Thermo Scientific).
Database search was carried out against the NCBI human RefSeq database using the SEQUEST search algorithm. The search parameters used were as follows: 1) precursor mass  Figure 3A, 3B). All stocks and dilutions were prepared using 0.5% acetonitrile containing 0.1% formic acid as diluent. 10mg/ml stocks of the standards and 5mg/ml stocks of the internal standards were prepared by reconstitution of the lyophilized powder in suitable volume of diluent. Aliquots were made and stored at -80°C. From these, peptide I and peptide IV, were diluted to 10 µg/ml while peptide II and peptide III were diluted to 100 µg/ml each. Likewise, peptide I* and peptide IV* were diluted to 10 µg/ml each while peptide II* and peptide III* were diluted to 100 µg/ml each. Dilutions were decided on the basis of lower detection limit of these peptides in test runs.
Next, a mixture of all the light synthetic peptides in diluent containing, 2.5 µg/ml each of peptide I and peptide IV and, 25µg/ml each of peptide II and peptide III, respectively, was prepared. This was the highest standard and was diluted serially, to construct an 8-point calibration curve in which the SIS peptide concentration was held constant and the light peptide concentration was varied by appropriately diluting the light standard peptides. The SIS peptide mix of peptide I*, peptide IV*, peptide II*, peptide III* was prepared in the diluent to achieve concentrations of 1.25, 1.25, 12.5 and 12.5 µg/ml respectively.10 µl each of standard mix was diluted to 200 µl of 75% acetonitrile, dried and reconstituted in 40 µl of diluent. 10 µl of internal standard mix was spiked, mixed well and 10 µl was injected into LC-MS. Final amount of the highest standard on column was as follows: peptide I and peptide IV (5ng each) and peptide II and peptide III (50 ng each). Amount of internal standard on column was as follows: peptide I* and peptide IV* (2.5 ng each) and peptide II* and peptide III* (25ng each). The concentration of internal standard peptide is accurately known, thus the concentration of the protein to be measured is determined in the unknown sample by peak area ratio. MRM assay was developed by optimizing parameters like collision energy and retention time for three transitions per peptides, which were selected on the basis of product ion intensities obtained after performing MRM of light peptides. For each peptide, three transitions were measured, out of which one was used as the quantifier and the other two were used as qualifier to confirm the retention time and identity. For the assessment of reproducibility and sensitivity of the developed assay, limit of quantitation (LOQ), lower quality control (LQC), middle quality control (MQC) and higher quality control (HQC) were determined and nine repeats of the calibration curve over three days was performed. The co-efficient of variation and accuracy was calculated by obtaining inter-day mean.
After method development, a cohort of control (n=4) and GBM (n=36) serum samples was subjected to MRM. After protein estimation, 3mg of each sample was loaded on the 18 % gel SDS-PAGE gel. After commassie staining, area between 10 to 16 kDa was subjected to ingel tryptic digestion. Briefly, after obtaining the gel piece between 10-16 kDa for all the samples, destaining of coommassie was performed. Then the gel piece was reduced, alkylated and tryptic digested for 18 hours. Tryptic digestion was stopped by 1% formic acid. The peptides were extracted from the gel piece by 60% acetonitrile followed by 90% acetonitrile. The extracted peptides were vacuum dried and stored in -80°C until used. Before performing MRM, samples were reconstituted into 40 µl of diluent, and 10 µl of heavy standard peptide mix of peptide I*, peptide IV*, peptide II*, peptide III* were spiked in.
An Agilent 1290 Infinity UHPLC system with an analytical column (Agilent SB -C18,

Protein -Protein Interaction (PPI) Network Analysis
To identify any reported interaction among the differentially regulated proteins, specifically with S100A8 and S100A9, we performed PPI analysis using 42 differential serum proteins using Network Analyst tool (http://www.networkanalyst.ca/) 4

. String
Interactome with confidence score cut-off of 900 was chosen as a PPI database.
Further, two types of network analysis was done: 1) zero order network, where only seed proteins were accessed for interaction among them 2) first order network, where reported possible interactions with any other proteins were also displayed. Importance of the node is decided by 1) degree centrality that is number of connections a node has 2) betweenness centrality that is shortest path going through the node.
Transcript data analysis for S100A8 and S100A9 Transcript data for S100A8 and S100A9 was obtained from various publically available microarray datasets namely: control (TCGA Agilent n=10, TCGA Affymetrix n=10, GSE22866 n=6, REMBRANDT n=28) and in GBM samples (TCGA Agilent n=572, TCGA Affymetrix n=528, GSE22866 n=40, REMBRANDT Grade II n=65, Grade III n=58, GBM n=227). This data was used for mainly three purposes: 1) to represent the differential regulation of transcript levels of S100A8 and S100A9 as compare to control samples by plotting a scatter graph 2) to analyse the significance of transcript levels of S100A8 and S100A9 in prognosis of GBM, the expression data from TCGA with clinical information (Affymetrix platform, n=518) was subjected to survival analysis 3) To perform sample vise correlation analysis with ESTIMATE scores, xCell microenvironment score, cell type specific gene signature and tumor purity score with the expression data of S100A8 and S100A9 (TCGA Agilent (n=318) and Affymetrix platform (n=416), TCGA RNAseq (n=151).
We utilized expression data provided by TCGA, as well as microenvironment scores on the same TCGA samples provided by following three studies: Tumor Purity Score (Nat Biotechnology 2012) 5 Tumor purity score is derived from a computational method called ABSOLUTE. This  Bacterial purification of S100A8 and S100A9 S100A8 and S100A9 genes were tagged with GST (Glutathione S Transferase) by cloning the full length cDNA in pGEX4T1 vector (GE Healthcare). Briefly, clones obtained were transformed into BL 21 DE3 pLysS expression vector, grown till 0.8 (OD 600) units, induction carried out with 0.5 mM IPTG for six hours, following which cultures were processed by sonication in lysis buffer and the soluble fraction obtained by spinning at 30,000g for 30 minutes. This fraction was incubated with Glutathione Agarose beads (Novagen) and the tagged protein eluted with 10mM reduced Glutathione. Purity was confirmed by Coomassie staining and immunoblot with corresponding antibodies against S100A8, S100A9 (Abcam) and anti-GST antibody (data not shown). The purified protein concentration was quantified in a Nanodrop machine by using molar extinction coefficient values of tagged proteins.

Colony Formation Assay
Colony formation assay was carried out using 6-well culture plate. Briefly, 1000-1500 cells (U251 and T98G) were plated in the 6-well culture plate in serum free conditions along with GST or recombinant proteins-rS100A8 and rS100A9 (0.5 µg/ml), as indicated. After incubation at 37 degree for 12-16 hours, media was changed to complete medium. After 12-15 days, cells were fixed with methanol for 30 minutes, stained with 0.1% crystal violet for 30 minutes, photographed and numbers of colonies were counted.

Migration and Invasion Assay
Migration and Invasion assays were carried out using trans-well Boyden chambers.

Survival analysis and statistics
The prognostic significance was tested for S100A8 and S100A9 transcript levels in GBM TCGA data (Affymetrix platform, n=518) by performing univariate and multivariate Cox proportional hazard analysis using SPSS software version 19 (IBM Cor., New York, USA). Kaplan-Meier method was used to perform survival analysis at median cut-off of S100A8 and S100A9 A, B) Highlighted selected peptides Peptide I and Peptide II for MRM for S100A8 and Peptide III and Peptide IV for S100A9, in their protein sequence.

C)
Confirmation of synthesized peptides, both, unlabelled light peptides (Peptide I, Peptide II Peptide III and Peptide IV) and isotope labelled SIS peptides (Peptide I*, Peptide II*, Peptide III* and Peptide IV*) by MS and MS/MS.

Supplementary Figure 4:
A, B) Calibration curves for S100A8 and S100A9 were performed. Calibration equation and coefficient of regression r 2 is indicated with weighting (w) set as equal.

C)
Representative control and GBM MRM profile, for four endogenous peptides (Peptide I, Peptide II for S100A8, Peptide III and Peptide IV for S100A9) and corresponding four SIS peptides (Peptide I*, Peptide II*, for S100A8 and Peptide III* and Peptide IV* for S100A9) used as internal standards.

Supplementary Figure 5:
A) Heat map analysis depicting the correlation between tumor transcripts of 7 high abundant proteins in GBM sera (which correlated significantly with stromal scores and tumor purity score) and xCell cell-type specific signature obtained by xCell method using TCGA GBM RNAseq data. Out of the 64 cell types provided by xCell method only those which gave significant correlation with at least one of the 7 genes were used to plot the heat map. * represents the significant correlation with p value<0.05. Spearman correlation values are colour coded, red for positive correlation, black for no correlation and green for negative correlation.

B, C)
Correlation plot between transcript levels of myeloid cell surface antigen CD33 with transcript levels of S100A8 and S100A9. Correlation coefficient and p value, calculated by Spearman's correlation are indicated, dotted lines represent 95% confidence interval.

C, D)
Transcript levels of S100A8 and S100A9 in TCGA Agilent and Affymetrix datasets, showing higher expression in IDH1 wildtype (IDH1 wt) as compared to IDH1 mutant (IDH1 mut) Unpaired t-test with Welch's correction was performed between IDH1 wt and IDH1 mut GBM samples, p values are indicated, p < 0.05 is represented with *p < 0.01 is represented as ** and p < 0.001 is represented as ***.

E, G, I)
Distribution of ESTIMATE score, xCell microenvironment score and tumor purity score, in GBM subtypes, classical (CL), mesenchymal (MES), neural (NE), proneural (PN) are shown. ANOVA was performed for significance test and p value less than 0.05 is considered significant with *, **, *** representing p value less than 0.05, 0.01 and 0.001 respectively.
F, H, J) Distribution of ESTIMATE score, xCell microenvironment score and tumor purity score, monocyte score and Th1-helper cell score in TCGA Agilent datasets, in IDH1 wildtype (IDH1 wt) as compared to IDH1 mutant (IDH1 mut) are shown. Unpaired t-test with Welch's correction was performed between IDH1 wt and IDH1 mut GBM samples, p values are indicated, p < 0.05 is represented with *p < 0.01 is represented as ** and p < 0.001 is represented as ***.

K)
Correlation of transcript levels and serum protein levels of S100A8 and S100A9 in various publically available datasets and in our cohort are shown. Correlation coefficient (r) and p values calculated by Spearman's correlation are indicated.

Supplementary Figure 7:
A) ROC curve depicting serum levels of S100A9 do not discriminate between healthy control and GBM, AUC and p value is indicated.

B, C)
Kaplan Meier survival analysis using pre-operative serum levels of S00A8 and S100A9 respectively, in our GBM patients cohort (n=87) that does not give prognostic significance. Logrank (Mantel-Cox) test was applied and the p value is indicated.

D)
Kaplan Meier survival analysis using pre-operative serum levels of S100A9, in our GBM patients cohort surviving more than median survival (n=35) that does not give prognostic significance. Log-rank (Mantel-Cox) test was applied and the p value is indicated.

Supplementary Figure 8:
A, B) Role of exogenously added rS100A8 and rS100A9 (recombinant proteins) on glioma cell lines proliferation was measured using colony formation assay. Representative images of duplicate wells after fixing and staining are shown along with the quantitation. p value less than 0.05 is considered significant, ns=non-significant.

C-F)
Role of migratory property of exogenously added rS100A8 and rS100A9 (recombinant proteins) on glioma cell lines was measured using trans-well assay. Representative images of T98G, U373, U138, and U87 cells fixed and stained after migration and invasion respectively are shown along with the quantitation. p value less than 0.05 is considered significant with *, **, *** representing p value less than 0.05, 0.01 and 0.001 respectively.
G) Concentration dependent role on migratory and invasive property of exogenously added S100A8 and S100A9 was measured using trans-well assay. Representative images of U138 cells fixed and stained after migration and invasion respectively are shown.

Supplementary Figure 9:
First order protein-protein interaction network with 40 differential proteins. This network displays all reported interactions of the seed proteins with any other protein in the database.
Importance of the node is decided by 1) degree centrality that is number of connections a node has depicted by size of the node (larger the size more the degree) 2) betweenness centrality that is shortest path going through the node depicted by colour of the node (Red showing highest and blue showing lowest value.       Correlation between S100A8 and S100A9 expression K S100A8