Alterations in protein expression and site-specific N-glycosylation of prostate cancer tissues

Identifying molecular alterations occurring during cancer progression is essential for a deeper understanding of the underlying biological processes. Here we have analyzed cancerous and healthy prostate biopsies using nanoLC-MS(MS) to detect proteins with altered expression and N-glycosylation. We have identified 75 proteins with significantly changing expression during disease progression. The biological processes involved were assigned based on protein–protein interaction networks. These include cellular component organization, metabolic and localization processes. Multiple glycoproteins were identified with aberrant glycosylation in prostate cancer, where differences in glycosite-specific sialylation, fucosylation, and galactosylation were the most substantial. Many of the glycoproteins with altered N-glycosylation were extracellular matrix constituents, and are heavily involved in the establishment of the tumor microenvironment.


Results
Protein expression levels and site-specific N-glycosylation of 95 tissue microarray (TMA) biopsy samples were analyzed, among which there were 9 G1, 16 G2, 24 G3, and 46 normal tissues. Digital images of a stained sample from each group are shown as examples in Supplementary Figs. S1-S4. In the case of cancerous samples both the original and an annotated image indicating cancerous and non-cancerous tissue areas are shown. The sample preparation consisted of on-surface tryptic digestion 31 followed by C 18 SPE cleanup and acetone precipitation for glycopeptide enrichment. After precipitation, the glycopeptide-enriched pellet fraction and the supernatant fraction containing non-glycosylated peptides were analyzed separately 32 . The workflow is summarized in Fig. 1, detailed information on each step is discussed in the "Methods" section.
The "Results" section is divided into three major parts: (i) the molecular differences between healthy and cancerous prostate tissue; (ii) the molecular changes with PCa grade progression, and differences between distinct grades and healthy tissue; (iii) and the biological processes altered in PCa. While the first two sections are based on data from both the proteomics (containing protein intensities) and glycoproteomics datasets (containing glycopeptide intensities and metrics calculated from them), the third one is based on proteomics data only. Before describing the results of the three aforementioned sections, a general characterization of the two datasets (proteomics and glycoproteomics) is provided.
MaxQuant quantified 653 proteins altogether in the 95 supernatant samples analyzed. From these, proteins that were found in less than 60% of any of the sample groups were excluded. Missing values were then imputed as described in the "Methods" section. Differences between healthy and cancerous tissues. To investigate differences between healthy (normal) and cancerous (PCa) tissues, Student's t-test was performed on proteomics and glycoproteomics data separately using 0.05 false discovery rate (FDR). Between the normal and PCa groups, 123 proteins were found to be differentially expressed, this included 72 proteins overexpressed and 51 proteins underexpressed in PCa (Supplementary Table S2). Among these, 14 showed a fold-change over 2, while 27 displayed a fold-change under 0.5 (Fig. 2).
In the glycoproteomics dataset, 7 glycopeptides were found with significantly different abundances between the normal and PCa groups ( Supplementary Fig. S5), each carrying biantennary, fucosylated complex-type glycans with different levels of galactosylation and sialylation. In five cases, glycopeptide expression was lower in PCa tissues: four glycoforms of Immunoglobulin gamma-1 heavy chain (IGG1) N299, and one glycoform of Prothrombin (THRB) N121. The other two showed higher expression levels in PCa: one glycoform of Microfibrilassociated glycoprotein 4 (MFAP4) N137 and one glycoform of Biglycan (PGS1) N270.
Significant differences were also detected between normal and PCa tissues when comparing the levels of sialylation, fucosylation, and galactosylation at distinct glycosites. The differences in glycosite-specific sialylation, fucosylation, and galactosylation are summarized in Fig. 3.
All but one of the eight differentially sialylated glycosites were underexpressed in PCa (Fig. 3A). The differences in sialylation were below 10% for most glycosites, except for Periostin (POSTN) N599 and Prostatic acid phosphatase (PPAP) N94 with a 38.6% and 15.1% decrease respectively, and CO6A2 N785 with a 10.3% increase in sialylation. Although only a 4.6% difference, THRB N121 showed the greatest relative change with a degree of sialylation almost 3.5 times lower in PCa than in normal tissues. Opposed to this, all four differentially fucosylated glycosites were overexpressed in PCa with the biggest differences on N785 of collagen alpha-2(VI) chain (CO6A2), POSTN N599, and PPAP N94 with a 27.7%, 47.6%, and 35.9% increase in fucosylation, respectively (Fig. 3B). The significant differences in galactosylation levels found on five glycosites (Fig. 3C) were much smaller than changes in fucosylation or sialylation, the two major ones being the increase of galactosylation at MFAP4 www.nature.com/scientificreports/ N137 by 10.6% and the decrease of galactosylation of Immunoglobulin heavy constant gamma 2 (IGHG2) N176 by 8.7% in case of cancerous samples. Interestingly, while changes in fucosylation always increased in the case of PCa samples (Fig. 3D), in the case of sialylation and galactosylation they did not.

Differences among various grades of PCa and Normal tissue.
To uncover molecular alterations among pathological grades and normal tissue, Analysis of Variance (ANOVA) was performed (FDR controlled at 0.05) on both proteomics and glycoproteomics data separately. For exact parameters see the "Methods" section.
In the proteomics dataset, 75 proteins were identified with significant changes (Supplementary Table S3) among the various PCa grades and healthy tissue. Hierarchical clustering in Perseus with Spearman's correlation revealed two distinct groups among these proteins: in 40 cases the proteins were upregulated (Fig. 4A), while in 35 cases they were downregulated (Fig. 4B) in cancer.
Afterward, a post-hoc test was performed on the 75 ANOVA significant proteins (Tukey's Honest Significant Difference test). This revealed that most of the proteins were differentially expressed between the normal and the two high-grade groups (G2 & G3), while there were only 3 such proteins between G2 and G3, 8 proteins between G1 and G3 and 14 between normal and G1 groups. The list of these proteins is included in Supplementary  Table S3 broken down into six groups corresponding to all group-wise comparison combinations. Furthermore, many of them (more than 85%) showed differential expression in not only one but multiple group comparisons ( Supplementary Fig. S6).
In the glycoproteomics dataset, ANOVA and the following post-hoc test (Tukey's HSD) revealed 4 glycopeptides with significantly different abundances among different grades and healthy tissue. Three of them correspond to the same glycosite N299 of IGG1 and carry biantennary complex-type glycans. In all three cases, the significant differences were between the Normal-Grade 2 and Normal-Grade 3 groups, and the observed trends were similar (average correlation coefficient of 0.980). The overall amount of IGG1 glycopeptides did not change significantly with PCa progression. The fourth glycopeptide corresponds to glycosite N137 of MFAP4 and also carries a biantennary complex-type glycan. In this case, the significant difference is between the Normal-Grade 1 groups (Fig. 5). www.nature.com/scientificreports/ Furthermore, regarding glycosites, ANOVA identified that the degree of fucosylation on CO6A2 N785 was different between the three Grade groups and Normal tissue. Interestingly, fucosylation shows a monotonic increase until G2 then decreases in G3 (Fig. 6A). This tendency is opposite to the changes in protein expression levels of the 3 identified CO6 subunits A1, A2, and A3 ( Fig. 6B) apart from both being nearly constant between the Normal and G1 groups.
In addition to pathological grades, alterations between Gleason grades and healthy tissue were investigated as well. The number of samples analyzed in the different GG groups was as follows: 7 in GG2, 12 in GG3, 15 in GG4, and 15 in GG5. The data analysis was carried out similarly to that of pathological grades.
The results of the analysis based on GG groups showed great similarity to those based on pathological grades. In the glycoproteomics dataset, the same glycosylated features were identified with significant changes, regarding both glycopeptides and glycosites. In the proteomics dataset, 60 proteins were identified as differentially expressed, opposed to 75 in the analysis based on pathological grades, with 57 common ones between the two. The overlap between these two sets of proteins and the group classifications of the 49 PCa samples are summarized in Fig. 7A,B, respectively. The correlation was also calculated for the 57 common proteins for the two datasets. Gleason grades were grouped based on the amount of overlap with pathological grades (Fig. 7B) in the following manner: GG2; GG3 and GG4; and GG5. The correlation coefficients between the GG2 -G1; GG3 and   Table 1. The complete lists are presented in Supplementary Tables S4-S7.
Most of the underexpressed proteins were associated with cellular component organization (34 out of 51), while the overexpressed proteins were predominantly affiliated with metabolic processes (60 out of 72).

Discussion
As the focus of this paper is on finding potential biomarkers through exploring alterations in the glycosylation between healthy and PCa tissues combined with proteomics data, only glycoproteins displaying significant changes are discussed individually. For these, the differences in protein expression and glycosylation are both reported, and they are compared to relevant previous studies on PCa or cancer in general. Furthermore, the most significant biological processes are also discussed.
The PPI network analysis provides information about biological processes, which are altered in PCa. The underexpressed proteins were mostly associated with cellular component organization (34 out of 51 proteins) and various processes connected to adhesion e.g.: the KEGG term "Focal adhesion" and the GO term "cell adhesion", and muscle contraction e.g.: the KEGG term "Vascular smooth muscle contraction" and the GO term "muscle contraction". Focal adhesion has been confirmed to be heavily involved in cancer progression 35 , while smooth muscle cells have been reported to be involved in PCa and BPH 36 . The overexpressed proteins, on the other hand, were primarily associated with metabolic processes (60 out of 72 proteins) with the GO terms "localization" and "regulation of gene expression" involving the most proteins. While altered localization of macromolecules in a cell (e.g. proteins 37 ) can reportedly drive tumor development and metastasis, aberrant gene expression is known to be the principal cause of cancer 38 .
All glycoproteins with significant glycosylation changes were quantified in the proteomics part of the study by MaxQuant, but not all of them showed differential expression between Normal and PCa tissues. This suggests that altered glycosylation does not necessarily indicate glycoprotein-wise differential expression. Furthermore, neither of the metrics used for the characterization of glycosylation (listed in Supplementary Table S1) showed significant overall changes between PCa and healthy tissues. Regarding cellular localization, all the glycoproteins with significant glycosylation changes were primarily of extracellular origin, most of them were associated with the Extracellular Matrix (ECM) and consequently, the Tumor Microenvironment (TME), which is known to heavily influence cancer initiation, progression, and invasion 39 .
There are several changes in glycosylation that are known to widely occur in cancer. These include increased and altered sialylation, increased branched-glycan structures, and fucosylation 40,41 . Also, there have been many PCa glycome-specific changes reported before 42 , e.g.: the expression of oligomannosidic glycans in the tumor region in late-stage PCa 43 . These changes however reflect only overall tendencies, they are not necessarily true for all of the glycosylation sites, as our results clearly demonstrate. www.nature.com/scientificreports/ In previous studies, serum sialylation has been linked to pathological grade and elevated sialic acid levels to bone metastasis 44 . In tissues, however, overall sialylation levels have been reported to be constant across different grades of cancer 22 . Our results suggest the same, the average sialylation levels were very similar throughout the different sample groups, but there were significant differences detected on several glycosylation sites. Most of them showed a decrease in sialylation except for CO6A2 N785, which showed an overall increase and significant differences between the different pathological grade groups. Also, proteomics results revealed that CO6A1, CO6A2, and CO6A3 expression levels significantly changed with PCa progression in a similar manner. This is highlighted by the fact, that CO6A1 has been reported to have an important role in tumor growth, and the molecular etiology of Castration-Resistant Prostate Cancer 45 .
Apart from serum, PCa cell lines have also been used before to identify diagnostic markers, and site-specific changes in fucosylation have been reported in PC3 and LNCap cell lines 46 . This aligns with our findings, as we have also found that fucosylation increased in PCa on multiple glycosites. Also, PPAP has been demonstrated to have a significant effect on PCa cell growth 47 , and it has been hypothesized to have higher site-specific fucosylation levels in PCa patients 46 . This is supported by our data: the average fucosylation level of PPAP N94 increased from 47 to 83% in PCa.
POSTN has been reported to be upregulated in aggressive PCa 48 , but significant changes in glycosylation have not been reported yet. Our proteomics results reaffirmed, that POSTN is overexpressed in PCa, and we also detected significant changes in both fucosylation and sialylation on POSTN N599, an increase from 24 to 72% and a decrease from 83 to 44% respectively, highlighting its' possible importance.
Prostate tissue is known to be a rich reservoir of Prothrombin 49 , the precursor of Thrombin, which has been reported to promote prostate tumor growth, increase tumor cell seeding, and stimulate angiogenesis 50,51 . We www.nature.com/scientificreports/ have found that the sialylation of THRB N121 was downregulated significantly in PCa, moreover, with the largest relative difference. Alterations of serum IgG glycosylation has been reported in many diseases, including PCa 52 , and IgG1 has been suggested as a potential target for PCa treatment 53 . We found that both IGG1 N299 and IGHG2 N176 show decreased overall galactosylation by 6.3% and 8.7% respectively. This is in line with previous studies, where one of the major differences reported was the decrease of terminal galactosylation in PCa compared to either healthy or benign prostatic disease patients 54 . Our data also shows reduced sialylation on both IGG1 N299 and IGHG2 N176 by 2.1% (corresponding to a relative change of 21.3% and 26.8% respectively), which is also in agreement with literature as reduced sialylation has been described as a major alteration in PCa compared to healthy individuals 55 .
Another glycoprotein with significant site-specific glycosylation changes was MFAP4, which has been reported to be involved in several cancers and may function as a tumor suppressor in PCa 56 . MFAP4 has been documented to have altered glycosylation in pancreatic adenocarcinoma 57 , however, not in PCa. Our results revealed that both sites of MFAP4 showed modified glycosylation in PCa: decreased sialylation on N87 and increased expression of the glycan N4H5S1F1 on N137. The latter glycoform might be a useful indicator in detecting PCa at an early stage, as this increased expression was detected between normal and G1 samples.
Most of the glycoproteins discussed above can be found in the Human Protein Atlas 58 (apart from IGG1 and IGHG2) and are categorized in the Pathology Atlas based on Prognostic summary and Cancer specificity. Apart from PPAP, which is a protein specific to PCa, all of them are unfavorable prognostic markers in certain types of cancer (in most cases renal cancer) which suggests that these glycoproteins are heavily involved in cancer progression. This information is summarized in Supplementary Table S8 supplemented by their Secretome annotation.
It is also important to note, that these glycoproteins have been detected in biofluids previously. All glycoproteins discussed above with the exception of POSTN have been detected in urine 59 , while POSTN has been detected in serum samples 60 of PCa patients. This suggests their potential usefulness as a clinical marker. Whether the alterations in the glycosylation of these proteins is PCa specific or not, needs further investigation, especially in the context of their biomarker status. www.nature.com/scientificreports/ In conclusion, our results indicate that alterations between PCa and Normal tissue glycosylation occur primarily on the glycosite level, while overall glycosylation may be unaffected. Furthermore, altered glycosylation does not necessarily indicate differential expression on the protein level. The glycoproteins with significant differences in glycosylation were all secreted either to blood or the ECM, and most of them are characterized as an unfavorable prognostic cancer marker by the Pathology Atlas. As altered protein glycosylation in cancer has been proven to be nonrandom, this suggests that further investigation of the glycosylation, and cancer specificity of these potential prognostic markers and identification of their exact roles is reasonable and could lead to further advancement in understanding the function of glycosylation in cancer development and PCa prognosis. On-surface digestion. First, the TMA slides were baked at 60 °C for 2 h following the supplier's instructions to prevent tissue detachment. Next, de-paraffinization was carried out by incubating the slides in different solvents/solutions sequentially as follows: xylene for 2 × 3 min, ethanol for 2 × 5 min, 90% ethanol-10% water for 3 min, 70% ethanol 30%-Water for 3 min, 10 mM NH 4 HCO 3 (water) for 5 min and finally water for 1 min. After dewaxing, the slides were placed in antigen retrieval buffer (20 mM Tris-HCl, pH = 9.0) for 30 min at 90 °C.

Methods
Following the preparation steps, the proteins in TMA cores were reduced using RapiGest and DTT in 1 µL of 20% glycerol for 20 min at 55 °C, then alkylated using IAA in 1 µL of 25 mM ammonium bicarbonate (ABC) puffer and 20% glycerol for 20 min at room temperature in the dark. The digestion was done in a cyclic manner, each one lasting for 40 min at 37 °C in a humidified box, 5 cycles in total. In the first two cycles, LysC-Trypsin mixture was added in a 1:25 ratio, in 1 µL 50 mM ABC and 20% glycerol. Subsequently, in the last three cycles, Trypsin was added in a 1:10 ratio, in 1 µL 50 mM ABC and 20% glycerol. After the digestion steps, the extraction of the protein digest was done by repeatedly pipetting 1 µL 10% acetic acid extraction solvent five times on the cores. Peptide extracts were then dried down, and clean-up was performed using C 18 spin columns (Thermo Scientific) using the manufacturer's protocol. The resulting samples were dried down and stored at -20 °C for further usage.
Acetone precipitation. Samples were reconstituted in 15 µL 1% FA and 150 µL ice-cold acetone was added and the solution was stored at -20 °C overnight. Then the samples were centrifuged at 13,000 g for 10 min, then the supernatants were removed, dried down, and stored at -20 °C. The pellet fractions were also dried down, then resuspended in 10 µL of injection solvent and subsequently stored in the autosampler unit until analysis.
For proteomics, DDA measurements were used. The cycle time was set at 2.5 s, with a dynamic MS/MS exclusion of the same precursor ion for 2 min, or if its intensity is at least 3 times larger than previously. Preferred charge states were set between + 2 and + 5. MS spectra were acquired at 3 Hz in the 150-2200 m/z range, while MS/MS spectra at 4 or 16 Hz depending on the intensity of the precursor. For glycoproteomics MS/MS measurements, the experimental settings were similar, except for collision energies. Mixed energy spectra were collected at 100% collision energy for 80% of the cycle time and 50% collision energy for 20% of the cycle time. For singlestage MS measurements, spectra were recorded over the mass range of 300-3000 m/z at 1 Hz. Following each run, raw data were recalibrated using the Compass DataAnalysis  www.nature.com/scientificreports/