DEAD-box RNA helicase protein DDX21 as a prognosis marker for early stage colorectal cancer with microsatellite instability

DEAD-box RNA helicase DDX21 (also named nucleolar RNA helicase 2) is a nuclear autoantigen with undefined roles in cancer. To explore possible roles of autoimmune recognition in cancer immunity, we examined DDX21 protein expression in colorectal cancer tissue and its association with patient clinical outcomes. Unbiased deep proteomic profiling of two independent colorectal cancer cohorts using mass spectrometry showed that DDX21 protein was significantly upregulated in cancer relative to benign mucosa. We then examined DDX21 protein expression in a validation group of 710 patients, 619 of whom with early stage and 91 with late stage colorectal cancers. DDX21 was detected mostly in the tumor cell nuclei, with high expression in some mitotic cells. High levels of DDX21 protein were found in 28% of stage I, 21% of stage II, 30% of stage III, and 32% of stage IV colorectal cancer cases. DDX21 expression levels correlated with non-mucinous histology in early stage cancers but not with other clinicopathological features such as patient gender, age, tumor location, tumor grade, or mismatch repair status in any cancer stage. Kaplan–Meier analyses revealed that high DDX21 protein levels was associated with longer survival in patients with early stage colorectal cancer, especially longer disease-free survival in patients with microsatellite instability (MSI) cancers, but no such correlations were found for the microsatellite stable subtype or late stage colorectal cancer. Univariate and multivariate analyses also identified high DDX21 protein expression as an independent favorable prognostic marker for early stage MSI colorectal cancer.

Fresh frozen tissue selection. For the initial proteomic discovery of protein biomarkers, we selected and studied two independent cohorts of fresh frozen tissues: one cohort of 22 CRC cases and a second cohort of 15 CRC cases. All tissue samples fulfilled the sample criteria of high tumor content (> 50%) or benign normal mucosa (for matched normal samples), minimal gross and microscopic necrosis (< 5%), and low blood contamination (< 5%). Matched pairs of frozen tumor tissue and benign colonic mucosa harvested away from the cancer (carefully stripped without muscularis propria) were retrieved from the vapor phase liquid nitrogen repository.
Tissue proteome extraction. Similar to our prior work 7,8,11 , samples of 5 mg of frozen tissue were thawed on ice and lysed with 200 μl lysis buffer containing 8 M urea, 0.1 M ammonium bicarbonate, phosphatase inhibitors 2 and 3 (Sigma), and protease inhibitors (Roche). The tissue mixture was homogenized with 12 cycles of 1-min sonication at 120 W power (FB120, Fisher Scientific) and intermittent cooling. After centrifugation at 14,000g for 30 min at 4 °C, the supernatant containing all soluble proteins was collected. The protein concentration was determined by a BCA assay (Pierce), and extracted proteomes were stored at − 80 °C until further analysis.
In-solution protein digestion. Aliquots of 50 µg of proteome lysate were reduced with 5 mM dithiothreitol at 56 °C for 30 min and then cooled to room temperate 7,8,11 . The reduced proteins were alkylated with 11 mM iodoacetamide at room temperature for 30 min in the dark. The protein solution was diluted sixfold with 50 mM ammonium bicarbonate and digested with trypsin and Lys-C (0.2 μg/μl, both from Promega) at 1:50 (w/w) at 37 °C for 12 h. The digestion was stopped by the addition of trifluoroacetic acid to a final concentration of 1%. The mixture was centrifuged at 14,000g for 10 min at room temperature. The clear supernatant was collected and desalted on a C 18 StageTip (lab-made). Desalted peptides were dried in a SpeedVac vacuum concentrator and re-dissolved in 10-15 μl of 3% acetonitrile/0.1% formic acid and stored at − 20 °C. Proteomic analysis. Desalted peptides, approximately 1 μg, were injected into a 50-cm C 18 capillary column mounted to an Easy-nLC 1200 system coupled to an Orbitrap Fusion Lumos mass spectrometer (Thermo Scientific) 7,8,11 . Peptides were eluted over a 200-min gradient in 2-35% buffer B (0.1% (v/v) formic acid, 100% acetonitrile) and buffer A (0.1% formic acid, 100% HPLC-grade water) at a flow rate of 300 nl/min. MS data were acquired with an automatic switch between a full scan and 10 data-dependent MS/MS scans. The target value for full-scan MS spectra was 1 × 10 6 charges in the 375-1500 m/z range with a maximum injection time of 50 ms and a resolution of 60,000 at 200 m/z in profile mode. Isolation of precursors was performed with a window of 1.4 m/z. Precursors were fragmented by higher-energy C-trap dissociation with a normalized collision energy of 30 eV. MS/MS scans were acquired at a resolution of 15,000 at 200 m/z with an ion target value of 5 × 10 4 , maximum injection time of 100 ms, and dynamic exclusion for 15 s in centroid mode.
Protein sequencing data analysis. We applied an overall data analysis strategy that is based on prior work from our laboratory [7][8][9][10][11] . Briefly, label-free protein quantification was carried out with MaxQuant (version 1.6.4) and the Andromeda search engine 12,13 . The first and the main maximum precursor mass tolerances were set to 20 and 6 ppm, respectively. The reference human proteome database was downloaded from UniProt. www.nature.com/scientificreports/ The search assumed trypsin and Lys-C digestions with up to 2 missed cleavages. A minimum of 1 peptide was required for protein identification, but 2 peptides were required to calculate a protein level ratio. The modifications used as variable modifications for protein identification and quantification included oxidation of methionine, acetylation of the protein N-terminus, phosphorylation of serine, threonine, and tyrosine residues, and deamidation of glutamine and asparagine. Significantly up-regulated and down-regulated proteins were identified with Perseus software (version 1.6.5) 14,15 . The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifiers PXD019103 and PXD019504.
Tissue microarrays. For tissue microarrays 7,8,11 , formalin fixed and paraffin embedded tissue blocks from 710 colorectal cancer patients were selected (with no patient overlap with the two frozen tissue cohorts used for mass spectrometry). Three separate 2-mm tissue cores from each tumor case were drilled out from each donor paraffin block and transferred to tissue array blocks using a robotic TMA arrayer (TMA Grand Master, 3DHistech). Tumor and normal areas were selected based on rigorous review of individual histologic slides for each donor block and electronic image-based coring target area selection in the TMA Grand Master software.

Immunohistochemistry (IHC).
Formalin-fixed paraffin-embedded tissues were cut into 4-μm sections 7,8,11 . Paraffin was removed with xylene, and antigens were retrieval by heat-mediated epitope retrieval (pH 6.0). DDX21 expression was determined with DDX21-specific polyclonal antibodies (HPA036593, 1:200 dilution, Atlas Antibodies). IHC staining was conducted with Leica BOND-MAX automation. We assessed DDX21 nuclear staining as positive if 10% or more tumor cells showed nuclear staining. Assessment of all tissue samples was independently performed by two pathologists without any clinical information. In cases of discrepancies in immunohistochemical assessment between the two pathologists, the cases were reviewed by them together and a consensus score was determined.
TCGA data. We analyzed mRNA sequencing data and clinical information from the TCGA (cohort of 244 colorectal cancers 16 ) by accessing the cBioPortal for Cancer Genomics (https ://www.cbiop ortal .org/). We also studied a 597-case TCGA colorectal cancer cohort 17 that lacks comprehensive clinical stage, KRAS mutation status, and mismatch repair status annotation.
Statistical analyses. Similar to prior work from our laboratory 7,8,11 , categorical variables were compared using Fisher's exact test. Numerical values were analyzed by the Mann-Whitney U test. Survival analyses were performed using the Kaplan-Meier method and compared by a log-rank test. Multivariate analyses of prognostic factors were performed with logistic regression models by using factors that showed significant differences in univariate analyses (p < 0.05). Statistical analyses were performed with the JMP Pro 14 software (SAS). All statistical analyses were considered significant with p < 0.05.

Results
Proteomic analysis of colorectal cancer and identification of DDX21. We first examined a group of 22 patients whose primary colorectal cancer and matched benign mucosa had been freshly frozen in liquid nitrogen. Using deep unbiased tissue proteomics by Fourier transform mass spectrometry 7-10 , we found that proteomic profiling robustly separates colorectal cancer from benign mucosa (Fig. 1a,b). We then searched specifically for proteins that (i) were upregulated in cancer vs. benign mucosa and that (ii) are also known human autoantigens based on literature searches. This approach led us to the identification of DDX21 (Fig. 1c,d).
Although DDX21, also termed nucleolar RNA helicase 2, is a known autoantigen, with autoantibodies found in patients with connective diseases and gastric antral vascular ectasia (watermelon stomach disease) 18-20 , its clinical significance in colorectal cancer, including effects on outcome, is unknown. This motivated us to study DDX21 in colorectal cancer. For validation, we repeated unbiased proteomic profiling using a second (independent) cohort of 15 patients. DDX21 protein was again found to be significantly overexpressed in cancer relative to matched benign mucosa (Fig. 1e,f).

DDX21 protein expression patterns in colorectal cancer tissues.
We investigated DDX21 protein expression in 710 colorectal cancer cases using tissue microarrays, which included a large cohort of 619 patients with early stage (stages I and II) cancer (Table 1) and a cohort of 91 patients with late stage (stages III and IV) cancer (Table 2). Because prognostic markers of early stage cancers are especially valuable for clinical cancer management, we carefully assembled this large cohort of early stage cancer cases with a representative distribution of colorectal cancers, comprised of 319 male and 300 female patients. The late stage cancer cohort is smaller and included 45 male and 46 female patients. Expression of DDX21 protein in tissue was examined by immunohistochemical detection. In normal benign colonic mucosa, protein expression of DDX21 was barely detectable with at most weak cytoplasmic and weak to undetectable nuclear (then restricted to nucleolar) DDX21 ( Fig. 2a-c). Cancers fell into two categories (with little heterogeneity within a case): (i) essentially negative or very weak cytoplasmic DDX21 (examples of an early stage and a late stage cancer in Fig. 2d,e,h,i, respectively) or (ii) strong (predominantly nuclear) staining for DDX21 (examples of an early stage and a late stage cancer in Fig. 2f,g,j,k, respectively). In about 20-30% of colorectal cancer cases, DDX21 overexpression was detected in cancer cells. Stromal cells of the lamina propria were essentially negative for detectable DDX21 protein (Fig. 2). Interestingly, DDX21 protein expression was particularly high in some mitotic cells (Fig. 3), a feature shared with other intrinsic autoantigenic proteins 21 www.nature.com/scientificreports/ DDX21 protein expression and clinicopathological features of colorectal cancer. To explore associations between DDX21 protein expression and clinicopathological features of colorectal cancer, we evaluated the expression level of DDX21 in each tissue in a semi-quantitative fashion. Each tissue sample was scored independently by two expert pathologists to obtain unbiased readings. Based on this scoring, the cancer cohort was then divided into two groups, a positive (or high expression) group that showed positive nuclear DDX21 staining in ≥ 10% of cancer cells, and a negative (or low expression) group that showed negative staining or staining in < 10% of cancer cells.
In the early stage colorectal cancer cohort, positive DDX21 expression was detected in 28.1% (63/224) of stage I and 20.5% (81/395) of stage II patients (Table 1). DDX21 expression appeared to be more prevalent in stage I than in stage II cancer tissues, with the difference being statistically significant (Table 1). There appeared to be no difference in DDX21 expression in the late stage colorectal cancer cohort, as positive DDX21 expression was detected in 30.3% (20/66) of stage III and 32.0% (8/25) of stage IV patients ( Table 2). DDX21 expression levels were then compared with various clinicopathological features (Tables 1, 2). In the early stage cohort, DDX21 expression levels did not differ significantly with regard to patient gender, patient age, tumor differentiation, tumor location, or mismatch repair status. However, positive DDX21 expression was significantly more prominent in non-mucinous carcinoma (32.2%) vs. mucinous carcinoma (13.0%) ( Table 1). In the late stage cohort, DDX21 expression showed no significant association with any of the examined clinicopathological features, including patient gender or age, tumor differentiation or stage, mucinous histology, tumor location, or mismatch repair status ( Table 2).
As DDX21 is a known nuclear autoantigen 2,19,20 , we investigated whether DDX21 expression correlates with the density (cell count per 10 high-power fields) of tumor infiltrating lymphocytes (TILs). Among 710 cases of the TMA cohort, 230 cases had available TIL data (113 early-stage CRCs and 117 late-stage CRCs). Neither MSS CRCs nor MSI CRCs showed significant TIL differences between DDX21 high and low groups ( Supplementary  Fig. 1).

Correlation between DDX21 protein expression and patient survival.
To evaluate the prognostic potential of DDX21 protein expression for colorectal cancer, we investigated the relationship between patient survival times and DDX21 using Kaplan-Meier analyses. Both the overall survival time and the disease-free survival time were analyzed. The stage I and II patients of this study had been followed for a range of 0.2-392.5 months, with a mean follow-up time of 80.6 months and a median follow-up time of 72.5 months (Fig. 4). The stage III and VI patients of this study had been followed for a range of 0.4-140 months, with a mean followup time of 51.2 months and a median follow-up time of 53.3 months (Fig. 5).
Of the 619 early stage cases examined by immunohistochemistry, 564 cases had evaluable survival followup data available in the medical records. Importantly, these 564 cases had no adjuvant chemotherapy history, and thus this group serves as an ideal unbiased cohort for prognostic outcome studies. When all 564 early stage cases were analyzed together, patients with positive DDX21 protein expression had both significantly longer overall survival and disease-free survival times than patients with negative DDX21 protein expression (Fig. 4a,b). However, when only the subtype of microsatellite stable (MSS) cases was analyzed, the patients with positive or negative DDX21 protein expression did not display significant differences in either overall survival or disease-free survival times (Fig. 4c,d). In contrast, among patients with the (microsatellite instability) MSI subtype of early www.nature.com/scientificreports/ www.nature.com/scientificreports/ stage colorectal cancer, disease-free survival times (but not overall survival times) were significantly longer for the DDX21 positive group (Fig. 4e,f). Among the 91 late stage patients with evaluable survival data, no significant differences in overall survival or disease-free survival times were observed between the positive and negative DDX21 protein expression groups (Fig. 5a,b). When the MSS subtype was analyzed separately, there were also no significant differences in survival times between the positive and negative DDX21 groups (Fig. 5c,d). As there were only 12 patients with the late stage MSI cancer, isolated statistical survival analyses were not meaningful for this subtype but visual inspection of survival curves did not reveal any difference.
We next asked whether there may be an association between DDX21 expression level and KRAS gene status that might contribute to survival differences. Using the TCGA CRC dataset 16 , we found no differences in DDX21 expression between early stage patients that had cancers with wild type KRAS vs. mutated KRAS (Fig. 6).
The survival analyses thus far indicated that positive DDX21 expression may be a prognostic for better disease-free survival in early stage colorectal cancer of the MSI subtype. To further explore this conclusion, we performed univariate and multivariate analyses for this group of patients (Table 3). As expected, patient age (≤ 65 years) was found to be a favorable factor for disease-free survival. In addition to patient age, positive DDX21 protein expression in cancer tissue was also an independent favorable prognostic factor for disease-free survival time in the MSI subtype of early stage colorectal cancer. www.nature.com/scientificreports/

Discussion
In this study, we first used unbiased deep proteomics to identify proteome signatures that quantitatively differentiate colorectal cancer from benign colonic mucosa (Fig. 1). Specifically focusing on candidate proteins that have known antigenic properties, we selected DDX21 and investigated the protein expression of DDX21 in a large cohort of 619 patients with early stage colorectal cancer and a small cohort of 91 patients with late stage colorectal cancer (Figs. 2, 3). We found high DDX21 protein expression in about 20-30% of colorectal cancer cases, regardless of cancer stage (Tables 1, 2). Our findings are compatible with previous reports of increased DDX21 expression in colorectal tumors at both the mRNA level and the protein levels 25,26 . The DDX21 gene has also been found to be overexpressed in other malignancies such as breast cancer 27,28 and lymphomas 29 . In breast cancer, DDX21 gene expression levels have been reported to be associated with longer overall and disease-free survival 30 . In contrast, based on data in the Human Protein Atlas database (www.prote inatl as.org) and DDX21 mRNA expression levels in a 597-case TCGA colorectal cancer cohort 17 , reduced DDX21 mRNA transcript expression correlated with higher probability of survival, although this did not appear to be an independent prognostic marker for colorectal cancer. To reconcile this with our findings, several facts need to be considered. First, there is a major difference in that the TCGA cohort study examined the transcriptional mRNA expression of DDX21, whereas our study looked directly at the protein expression of DDX21. Second, quantitative levels of an mRNA transcript and its translated actual protein product do often not correlate, as has been observed for multiple tumor types in recent proteogenomic studies by the NCI's Clinical Proteomic Tumor Analysis Consortium (CPTAC) [31][32][33][34][35] . Third, the TGGA cohort comprises a mixed cohort of colorectal cancer cases, whereas our outcome analyses are based on more focused cohorts, such as early stage cancer and the MSI subtype.
Cancer and the immune defense system are in a constant battle. The rising success of immunotherapy has changed the paradigm of cancer treatment from killing tumors directly to manipulating the immune system to target tumors. Most notably, immune checkpoint inhibitors have been successful in treating various malignancies, including melanoma, lung cancer, renal cell carcinoma, bladder cancer, head and neck squamous cell carcinoma, gastric cancer, ovarian cancer, Hodgkin lymphoma, and colorectal cancer 36,37 . Immune checkpoint inhibitors are effective in eliminating cancers because they can unleash the immune system to launch a broad spectrum of autoimmune-like attacks against tumors, or they may boost cancer-specific autoimmunity that is already pre-existing but weak in the cancer patients. Identification of cancer-specific autoantigens, which serve as targets of cancer-specific immunity, is instrumental for better understanding of cancer immunotherapy and therapy risk-stratification 9,10 . DDX21, also termed nucleolar RNA helicase 2, is a known autoantigen, with autoantibodies found in patients with connective diseases and gastric antral vascular ectasia (watermelon stomach disease) [18][19][20] . DDX21 plays multifaceted roles in multiple steps of ribosome biogenesis and coordinates transcription and ribosomal RNA processing [38][39][40][41] . DDX21 can efficiently unwind R-loops (three-stranded nucleic acid complex consisting of an RNA:DNA heteroduplex) and prevent R-loop-mediated stalling of RNA polymerases, whereas depletion of www.nature.com/scientificreports/ DDX21 leads to accumulation of cellular R-loops and DNA damage 42,43 . These observations suggest that DDX21 plays important roles in cancer cell biology, although the mechanisms are yet to be defined.
Our study found that high DDX21 protein expression in cancer tissue predicts better survival for early stage colorectal cancer patients of the MSI subtype, but not for the MSS subtype or late stage cancers (Figs. 4, 5, Table 3). DDX21 expression appears to be uncoupled from KRAS mutation status (Fig. 6). Among colorectal cancers, only the MSI subset has shown positive response to immune checkpoint inhibitor therapy [44][45][46] . However, a reliable protein response predictor of immunotherapy is still lacking. High expression of DDX21, a known nuclear autoantigen 2,19,20 , in MSI tumors may induce DDX21-specific autoimmunity against tumor cells that contributes to better clinical outcomes in cancer patients and perhaps better response to immune checkpoint inhibitor therapy. Although the latter of which will require further clinical investigation, a routine DDX21 IHC www.nature.com/scientificreports/ test of early stage CRCs may provide better risk prediction for MSI patients and possibly stratification of high-risk patients for adjuvant immunotherapy. Interestingly, in another recent study of ours, we found that high expression of Maspin, an autoantigen, was also associated with better clinical outcomes in the MSI subtype colorectal cancer 11 . It is possible that high expression of autoantigens in MSI cancer tissue helps induce cancer-specific autoimmunity, which consequently leads to better patient survival. It is also possible that this natural cancerspecific autoimmunity is boosted by immune checkpoint inhibitory therapies, which may lead to improved therapeutic responses.  www.nature.com/scientificreports/ In summary, we identified DDX21 autoantigen as a potential prognostic marker for the MSI subtype of early stage colorectal cancer. The mechanistic roles of DDX21 in cancers remain poorly understood and merit further investigation. Identification of DDX21 and other autoantigen markers in cancer tissues may pave the way for future development of more specific and more effective immunotherapy strategies against cancer.