DCBLD1 is associated with the integrin signaling pathway and has prognostic value in non-small cell lung and invasive breast carcinoma

Germline single nucleotide polymorphisms in the promoter region of the DCBLD1 gene are associated with non-smoking cases of both non-small cell lung carcinoma (NSCLC) and human papillomavirus-negative head and neck cancer. However the clinical relevance and function of DCBLD1 remain unclear. This multicenter retrospective study was designed to evaluate the prognostic value and function of DCBLD1 in the four main solid cancers: NSCLC, invasive breast carcinoma, colorectal adenocarcinoma and prostate adenocarcinoma. We included the following cohorts: GSE81089 NSCLC, METABRIC invasive breast carcinoma, GSE14333 colorectal adenocarcinoma, GSE70770 prostate adenocarcinoma and The Cancer Genome Atlas (TCGA) Firehose Legacy cohorts of all four cancers. DCBLD1 gene expression was associated with a worse overall survival in multivariate analyses for both NSCLC cohorts (TCGA: P = 0.03 and GSE81089: P = 0.04) and both invasive breast carcinoma cohorts (TCGA: P = 0.02 and METABRIC: P < 0.001). Patients with high DCBLD1 expression showed an upregulation of the integrin signaling pathway in comparison to those with low DCBLD1 expression in the TCGA NSCLC cohort (FDR = 5.16 × 10–14) and TCGA invasive breast carcinoma cohort (FDR = 1.94 × 10–05).

DCBLD1 gene expression and cancer outcome. We previously evaluated the role of the DCBLD1 gene in association with patient outcome in HNSCC 2 . For NSCLC, this association was only tested in univariate analysis on one cohort 5 , and nothing is yet known for breast, colorectal and prostate cancers. We examined if DCBLD1 gene expression had prognostic value in the eight cohorts of this study, using multivariate analysis with age, sex (when appropriate) and stage. The hazard ratio (HR) was based on the range of DCBLD1 expression levels, which was analyzed as a numerical variable. A higher HR was associated with a higher risk for patients who had high DCBLD1 gene expression. This type of analysis is less biased and more stringent than a cut-off based analysis 17 . It also shows a continuity of the risk increase through the variable distribution. Age was also analyzed as a continuous variable. Patients were not subdivided by sex for prostate and breast cancers as there were only 12 males in the TCGA cohort and no male in the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) cohort. Stages were grouped as stage 1 and 2 versus stage 3 and 4 to prevent bias from the low frequency of some stages for certain cancers. These stage groupings resulted in the smallest group to be n = 36 (for stage 3-4, NSCLC GSE81089 cohort).
For NSCLC, both TCGA and GSE81089 cohorts had reproducible significant results. High DCBLD1 gene expression and stage were significantly associated with a worst overall survival, while age and sex had no significant effect ( Table 2). When the same analysis was performed without stratifying stage into two groups, and histology and smoking status were added to the model, high DCBLD1 gene expression was again associated with a worst overall survival (Supplementary Table 1). Age (Fig. 1A,B), sex (Fig. 1C,D), stage (Fig. 1E,F), smoking history (Fig. 1G,H) and histology (Fig. 1I,J) were not reproducibly associated with DCBLD1 gene expression for NSCLC.
For invasive breast carcinoma, the TCGA and METABRIC cohorts also showed reproducible significant results. High DCBLD1 gene expression, age and stage were significantly associated with a worst overall survival (Table 2). Specifically, DCBLD1 expression was associated with the PAM50 molecular subtypes in both cohorts ( Fig. 2A,B). Basal-like and HER2-enriched breast cancers had significantly higher DCBLD1 expression compared to the normal-like and luminal subtypes in the METABRIC cohort. In the TCGA cohort, the HER2enriched subtype was not significantly different from the normal-like and luminal subtypes, but we observed lower DCBLD1 expression in the basal-like subtype. Since PAM50 subtypes carry prognostic value 18 , this may partly explain a DCBLD1 association with survival. Indeed, adding the PAM50 parameter to the multivariate model in both cohorts lowered the association of DCBLD1 gene expression with overall survival with P values of 0.05 for the TCGA cohort and 0.02 for the METABRIC cohorts (Supplementary Table 2). Age (Fig. 2C,D) and stage (Fig. 2E,F) were not reproducibly associated with DCBLD1 gene expression for invasive breast carcinoma. www.nature.com/scientificreports/ For colorectal adenocarcinoma, no association was found between DCBLD1 and overall survival in both the TCGA and the GSE14333 cohorts (Table 2). Also, no association was observed between DCBLD1 gene expression and age, sex or stage in colorectal adenocarcinoma ( Supplementary Fig. 1).
For prostate adenocarcinoma, biochemical recurrence was evaluated instead of overall survival due to the low numbers of deceased patients. Biochemical recurrence in prostate cancer is defined as a rise in the blood level of prostate-specific antigen following surgery, and it precedes clinical disease recurrence 19 . Only 10 of 496 participants were deceased in the TCGA cohort (median follow-up: 30.5 years). No association was found between DCBLD1 and biochemical recurrence in both the TCGA and GSE70770 cohorts (Table 2). Also, no association was observed between DCBLD1 gene expression and age or stage in prostate adenocarcinoma (Supplementary Fig. 2).

DCBLD1 expression in tumor tissues. DCBLD1 gene expression was investigated in paired tumor tissue
and normal adjacent tissue in the NSCLC (n = 108), invasive breast carcinoma (n = 111), colorectal adenocarcinoma (n = 32) and prostate adenocarcinoma (n = 52) TCGA cohorts (Fig. 3). Only participants with RNAseq results for both tissues were considered for this analysis. Statistically significant higher DCBLD1 in tumor tissue was observed for all four cancers with median of 1.47 fold for NSCLC, 1.54 fold for invasive breast carcinoma, 1.39 fold for colorectal adenocarcinoma and 1.25 fold for prostate adenocarcinoma. DCBLD1 mutations and copy number alterations in cancer. DCBLD1 was investigated within the TCGA PanCancer Atlas. Occurrence of mutations was evaluated, resulting in only 109 patients of 10,953 (1.0%) harboring mutations in the DCBLD1 protein coding region, and no single mutation was present in more than four total cases (0.04%) ( Supplementary Fig. 3). The only cancer with over 3% of mutations was uterine cancer with 33 cases out of 517 (6.4%) ( Supplementary Fig. 4). Copy number alterations were rare with only 36 patients    www.nature.com/scientificreports/

Upregulated and downregulated genes and pathways in patients with high DCBLD1 expression.
To understand the implications of high DCBLD1 expression, we compared the RNA-seq gene expression profiles of 50 patients with the highest DCBLD1 expression versus 50 participants with the lowest DCBLD1 expression in each of the four TCGA cohorts independently. We evaluated pathway enrichment in the four cohorts using the PANTHER pathway database (Table 3). Patients with high DCBLD1 expression had strong upregulation of the integrin signaling pathway in comparison to patients with low DCBLD1 expression for both NSCLC and invasive breast cancer. No pathway was upregulated in the colorectal and prostate cancer cohorts. Also, no pathway was downregulated in patients with high DCBLD1 expression in all four cohorts. There is three cancers for which high DCBLD1 expression has been associated to a worse overall survival and upregulation of the integrin signaling pathway: NSCLC, invasive breast carcinoma and HNSCC, which was previously published 2 . Evaluating the common changes in those three cancers for DCBLD1 high cases will allow to better understand DCBLD1 role in oncology and to further clarify how DCBLD1 is associated with the integrin signaling pathway. For this study, 37 common genes were differentially regulated between patients of high and low DCBLD1 expression for NSCLC, invasive breast carcinoma and HNSCC. All these genes were upregulated in the high DCBLD1 expression group with the exception of STRBP, which was downregulated. Interactions between these genes were evaluated using STRING protein interaction analysis with the highest confidence interval (0.9) (Fig. 4). This analysis allows to build the connectivity network of those genes for physical and functional interactions, using bioinformatics to combine publicly available sources of data 20 . A strong association was observed between ITGB1, ACTN1, ACTN4, VCL, PXN, TLN1, PLAU, PLAUR and SRPX2. An association between TIMP2, MMP2 and MMP14 was also observed. Other genes did not integrate in the network. DCBLD1 itself did not associate in the STRING connectivity network, although it was expected as its function is still undetermined. On the other hand, since DCBLD1 high expression is the common point for this analysis, it is likely that DCBLD1 should be inserted in those pathways. Further in vitro experiments will be necessary to determine where and if this association is physical or functional. For other genes which did not associate in the connectivity network, they are either unrelated to that network or their association has not yet been shown.

Discussion
In this study, we showed that DCBLD1 gene expression is prognostic of overall survival in NSCLC and breast cancer. For NSCLC and HNSCC, the association of germinal SNPs in the DCBLD1 promoter region has been clearly established, especially for patients who are non-smokers or have no classical cancer risk factors [1][2][3][4][5][6][7] . Moreover, DCBLD1 copy number alterations and mutations in the protein coding region are rare. This suggests that high DCBLD1 expression in tumors may arise from SNPs in the promoter region modifying gene regulation or alterations in transcription factors, or both. These SNPs may have similar implications for invasive breast carcinoma, particularly for subtypes that are more likely to harbor germline mutations. Basal-like cancers are usually triple-negative breast cancers, which harbor more germline mutations of BRCA1 and BRCA2 21,22 . In this study, we showed that basal-like cancers had a high expression of DCBLD1 in comparison to other subtypes. www.nature.com/scientificreports/ Whether this involves germline SNPs in the DCBLD1 promoter region is unknown and will need further study to determine an affiliation with this subtype. In three cancers (NSCLC, invasive breast carcinoma and HNSCC) for which DCBLD1 had prognostic value, high DCBLD1 expression showed statistically significant upregulation of the integrin signaling pathway. In contrast, high and low DCBLD1 expression showed no difference in the cancers for which DCBLD1 had no prognostic value. We hypothesized that an oncogenic role for DCBLD1 was associated with the activation of the integrin signaling pathway.
Using STRING protein interaction analysis, the upregulated genes in patients with high DCBLD1 expression revealed an important network of nine proteins that centered on ITGB1. ITGB1 is a transmembrane integrin that interacts with the ECM and stimulates cell-matrix adhesion when bound to a phosphorylated ACTN1 23,24 . VCL, PXN and TLN1 are adapter proteins that bind to ITGB1 and ACTN1, forming a link between ITGB1 and actin filaments 25,26 . These five proteins are central components of focal adhesions, which allow the intracellular actin cytoskeleton to associate with the ECM 27,28 . The reminder of the nine identified proteins includes ACTN4, which shares 86.7% amino acid sequence with ACTN1 and also binds to ITGB1, but its role in regulating focal adhesions is less clear 29 . PLAU and its receptor PLAUR are involved in the proteolysis of the ECM and mediate cleavage of ITGA6, which forms the heterodimeric laminin receptor with ITGB1 30,31 . SRPX2 is another PLAUR ligand 32 , but its association with focal adhesions is unclear. Lastly, TIMP2, MMP2 and MMP14 are mediators of ECM degradation associated with tumor metastasis 33 . For NSCLC, invasive breast carcinoma and HNSCC, the upregulation of all these proteins in conjunction with high DCBLD1 expression strongly suggests that DCBLD1 is involved in focal adhesions and therefore, cell migration.
Previous in vitro experiments reveal that the DCBLD1 interactome consists mainly of adaptor proteins and proteins associated with actin dynamics 15 , further implicating a role for DCBLD1 in focal adhesions and supporting the idea that DCBLD1 is a NRP-like protein. Both NRP1 and NRP2 are involved in focal adhesions: NRP1 regulates focal adhesion turnover and NRP2 regulates α6β1 integrin association with the cytoskeleton 11,34 . The exact role of DCBLD1 in focal adhesion formation is yet to be discovered, but the role of focal adhesion turnover in tumor cell migration has already been established 27 and may provide insight into the poor prognosis among potentially aggressive cancers with high DCBLD1 expression. Upregulation of the integrin signaling pathway was not observed for colorectal and prostate cancers, the prognosis was similar for patients with high or low DCBLD1 expression within these cohorts, suggesting that DCBLD1 activity is cell-type dependent. On the other hand, we also showed that DCBLD1 expression is higher in tumor tissue for all four cancers. The fact that DCBLD1 was upregulated also in cancers for which it has no prognostic value suggests that factors regulating DCBLD1 might be generally regulated in cancer. Identifying how DCBLD1 gene expression is regulated might help understand why it has a prognostic value in some cancers. Association between cancer migration and patient survival is well established, and more specifically for breast cancer, a migration transcriptomic signature was previously published and showed to predict overall survival for that cohort 35 . We hypothesize that DCBLD1 expression prognostic value also comes from DCBLD1 association with migration through upregulation of the integrin signaling pathway and perhaps more importantly regulation of focal adhesion. Since one study evaluating the oncological role of DCBLD1 using the A549 lung adenocarcinoma cell line showed a decrease in xenograft tumor growth when using a stable DCBLD1 knockdown cell line 7 , it is reasonable to hypothesize that DCBLD1 has a regulating role in those pathways. This also suggests that DCBLD1 could be a potential therapeutic target.
The retrospective design of this study was the main limitation of the study and may have introduced bias and confounding factors. We used multivariate analysis when examining patient outcome to overcome this limitation as much as possible, although it is likely that some confounding factors were not included in the multivariate analysis. To limit censor bias, we used overall survival and biochemical recurrence as outcomes as they are well defined. Also outcome was evaluated until 95% of the patients were either censored or deceased to prevent potential bias from some rare patients with very long follow-up. Immortal time bias was prevented because samples were taken at surgery (day 0) in those cohorts, on the same specimen as pathology assessment. Another limitation was that analyses focused on RNA expression data and not actual protein levels. The prognostic value of measured DCBLD1 protein levels in NSCLC, invasive breast carcinoma and HNSCC will warrant further studies. www.nature.com/scientificreports/

Conclusion
Using multiple cancer cohorts, this study showed that DBCLD1 is associated with the integrin signaling pathway and focal adhesions, and has prognostic value for NSCLC and invasive breast carcinoma. Given that germline SNPs in DCBLD1 are associated with non-smoking lung and head and neck cancers, and demonstrate prognostic value in other cancers, further studies are needed to evaluate its potential as a therapeutic target.

Materials and methods
Study cohorts. This retrospective study included multiple independent cohorts. NSCLC was represented by the NSCLC TCGA cohort, which combined the TCGA Firehose Legacy LUAD 36 47,48 . TCGA mRNAseq data was extracted from FirehoseR (gdac.broadinstitute.org) 49 . Data from the GSE14333, GSE81089 and GSE70770 cohorts were extracted from the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus database 50 . Every participant of these cohorts with RNA expression and outcome data were included in the study. Analysis of the TCGA PanCancer Atlas 51   Clinical characteristics. DCBLD1 gene expression was measured by RNA-seq for all TCGA cohorts and GSE81089, and measured by array for the METABRIC, GSE14333 and GSE70770 cohorts. DCBLD1 gene expression was normalized using z-score normalization of the log expression value, except for tumor versus normal adjacent tissue comparison where it was analyzed as log 2 RSEM. For multivariate analysis, DCBLD1 gene expression and age were analyzed as numerical variables. Stages were subdivided into two groups: stage 1 and 2 versus stage 3 and 4. Patients were subdivided by sex when possible except for invasive breast carcinoma, which had only 12 males in the TCGA cohort and no males in the METABRIC cohort. Outcome was evaluated until 95% of the patients were either censored or deceased to prevent potential bias from some rare patients with very long follow-up. Samples were taken at surgery (day 0) in those cohorts, on the same specimen as pathology assessment. For NSCLC, tobacco use (ever versus never users) and histology (LUAD versus LSCC) were also assessed. For invasive breast carcinoma, PAM50 subtypes were also evaluated. For prostate adenocarcinoma, biochemical recurrence was used.
Statistics. HR was evaluated using the multivariate Cox proportional hazards analysis for the multivariate survival prediction model. The significance of the gene expression variations was determined by Student's t-test and Tukey's honest significance test for nominal variables. DBCLD1 gene expression and age association Differentially expressed genes in patients with high DCBLD1 expression NSCLC, invasive breast carcinoma and HNSCC. STRING protein interaction analysis of the 37 genes differentially regulated for patients with high DCBLD1 expression (n = 50) in comparison to those with low DCBLD1 expression (n = 50) in the TCGA cohorts for NSCLC, invasive breast carcinoma and HNSCC TCGA cohorts. The network shows results for the highest confidence interval (0.9) interaction scores on STRING v11 (https:// string-db. org/). STRBP has lower expression, while the 36 other genes have higher expression in patients with high DCBLD1 expression. www.nature.com/scientificreports/ was evaluated using linear regression. DCBLD1 gene expression comparison for paired normal and tumor tissue was done using a paired Student's t-test. Pathway enrichment was evaluated using the PANTHER pathway database 52 . For the PANTHER pathways annotation data set, Fisher's exact test was corrected using FDR. Interactions between proteins was determined using STRING v11 (https:// string-db. org/) with the highest confidence threshold (0.9) 20 . Tests of statistical significance were two-sided and P values less than 0.05 were considered significant with *P < 0.05, **P < 0.01 and ***P < 0.001. Statistical analysis was performed using JMP 12.0.1 statistical software (SAS Institute Inc) with the exception of the whole exome analysis, which was performed using GraphPad Prism 7.04 (GraphPad Software Inc.).