Main

The extracellular matrix (ECM) is a multi-molecular substance that serves functions ranging from cellular adhesion and motility to cell signalling. The extracellular matrix proteins are constructed from a relatively small repertoire of phylogenetically conserved amino-acid domains and genome-wide in silico analysis has led to the categorisation and indexing of the known protein constituents of the ECM. This ECM protein inventory, termed the core matrisome (CM) (Hynes and Naba, 2012), provides a platform for the analysis of physiological and disease-specific patterns of ECM protein expression.

Many solid tumours are characterised by the production of a dense, collagen-rich matrix, the deposition of which is associated with adverse outcome (Erler et al, 2006; Levental et al, 2009; Lu et al, 2012; Naba et al, 2014; Acerbi et al, 2015). The alteration in ECM protein constituents resulting from the development of a cancer causes reciprocal changes in the cancer cell and activates pathways responsible for cell migration, inhibition of apoptosis and proliferation (Pickup et al, 2014). The cancer ECM also promotes the formation of an unstable and chaotic vascular bed, poor oxygen delivery and hypoxia (Gilkes et al, 2014); factors that favour disease progression and metastasis.

Although excessive ECM deposition is a recognised hallmark of cancer, the specific proteins comprising the cancer ECM and their potential contribution to cancer biology and prognosis are less well studied. Here we address these issues using bioinformatics to compare the expression of CM genes in tumours and their normal tissue counterparts. We identify a nine-gene CM signature common to a range of cancers, the expression of which predicts poor prognosis in several solid cancer types. These data provide an impetus to further study proteins of the ECM in order to gain a greater understanding of cancer biology and develop clinical tools for prognostication.

Materials and methods

Ethics

All bioinformatics data were anonymised and required no ethical approval. Commercially available tissue microarrays were produced by US Biomax Inc. under the highest ethical standards with the donor being informed completely and with their consent. No ethical approval was required for this study.

Identification of the CM gene signature

All gene encoding proteins of the CM (http://matrisomeproject.mit.edu/) were included. Gene expression levels were determined in studies comparing lung, breast, ovarian, gastric, oesophageal or colorectal adenocarcinoma with normal tissue in Oncomine (www.oncomine.org/resource/login.html). Here gene expression data are normalised across studies enabling summative gene expression comparisons. Median gene rank (cancer vs normal analysis) was meta-analysed across studies of the same cancer type using Oncomine statistical algorithms. P values of the difference in gene rank were corrected for multiple hypothesis testing using the false discovery rate (FDR) method as described by Storey and Tibshirani (2003). Venn diagrams were generated and analysed using InteractiVenn (Heberle et al, 2015).

Immunohistochemisty

Immunohistochemistry for ECM proteins was performed on colon cancer and matched normal tissue microarrays from 20 patients (US Biomax Inc). Frozen tissues were immediately fixed in ice-cold acetone and blocked in normal serum of the species in which the secondary antibody was raised. Primary antibodies for col11a1 (ab64883), col1a1 (ab34710), col10a1 (ab58632) and spp1 (ab69498) were all obtained from ABCAM. Fluorochrome-conjugated secondary antibodies were obtained from Invitrogen. Tile-scanned images were taken at × 10 magnification using the Nikon Eclipse 90i epifluorescence microscope and were analysed using ImageJ.

Determination of the effect of the nine-gene CM signature on cancer outcome

Survival analysis was performed using KM plotter for ovarian (Gyorffy et al, 2012), gastric (Szász et al, 2016) and lung (Győrffy et al, 2013) cancers or GraphPad Prism for colorectal, renal, bladder or prostate cancer data derived from cBioportal (Cerami et al, 2012; Gao et al, 2013) or data set GSE17538. KM plotter is a manually curated, biannually updated database enabling survival analysis across multiple GEO data sets simultaneously. The GEO data sets used here are shown in Table 1. We used JetSet probes throughout (Li et al, 2011) and patients were divided into two groups on the basis of median expression of the nine-gene signature. For the analysis of colorectal cancer data sets, a z-score threshold of +1 for gene expression was used to define patients. Kaplan–Meier survival curves were constructed and compared using the log-rank method to generate hazard ratios and P values. Survival curves were generated using GraphPad Prism Version 7. Multivariate analysis was performed in lung, gastric and colorectal cancer data sets using SPSS (2017).

Table 1 Details of data sets used for survival analysis

Geneset enrichment analysis

Geneset enrichment analysis (GSEA) analysis (Subramanian et al, 2005) was performed using the Broad Institute desktop application (http://software.broadinstitute.org/gsea/downloads.jsp) on RNA-Seq expression data from TCGA colorectal, gastric, ovarian and lung cancer data sets. Phenotypes were defined on the basis of expression of the nine-gene CM signature with samples divided into high or low expression again using a z-score of +1 to define groups. Genesets were identified in the molecular signatures database (http://software.broadinstitute.org/gsea/msigdb/index.jsp), with the exception of the angiogenesis geneset, which was identified in a recent publication describing a meta-analytical approach that identified a transcriptional programme for angiogenesis in human cancers (Masiero et al, 2013) and the EMT geneset (Gröger et al, 2012).

Results

Identification of a CM gene signature expressed by adenocarcinomas

In order to identify ECM proteins important for cancer progression, we compared the expression of genes comprising the CM in adenocarcinomas (ACs) and normal tissues. We focused specifically on ACs because in organs where both squamous cell carcinomas (SCCs) and ACs develop, these tumour types may originate from different cell lineages (Yan et al, 2010; Yuan et al, 2010), are characterised by different genetic landscapes (Contag et al, 2004; Gao et al, 2014) and demonstrate differences in their sensitivity to various treatment modalities (Katanyoo et al, 2012; Chen et al, 2014). These differences may relate in part to differences in the composition of the tumour ECM or regulation of ECM expression, and we did not want this to act as a confounder in the identification of a CM gene signature.

We identified a large number of CM genes expressed at significantly higher levels in ACs compared with their normal tissue counterparts (Figure 1A). Cancers of the oesophagus and lung in particular were highly different from their parent organs, with 110 and 97 of 274 CM genes significantly upregulated in these cancers, respectively. In comparison, ovarian cancers demonstrated less of a difference with only 43 of 274 CM genes significantly upregulated in the tumour (Supplementary Table 1).

Figure 1
figure 1

Development of a core matrisome gene signature from multiple cancer data sets. (A) CM gene expression based on gene rank for cancer vs normal tissue in various tumour types. Red squares indicate high rank in the cancer relative to the normal tissue. Grey indicates that the gene was not measured. Genes are listed in order of median rank across the analysis of included studies for that particular cancer type. (B) Venn diagram used to identify common CM genes that are significantly overrepresented throughout all cancer types identified in A. (C) The gene signature derived from the Venn diagram in B displaying the nine common, significantly upregulated genes identified across the analyses of all cancer types from A. (D) The nine-gene CM signature showing median gene rank (red=high expression) in cancer compared with normal tissue for each included study and FDR-corrected P values for the meta-analytical comparison. (E) Fluorescence immunohistochemistry for SPP1, Col10a1, Col1a1 and Col11a1 in colon cancers and matched normal colon with quantification of the area (%) of the microarray core demonstrating positive staining (n=20 per analysis). A full colour version of this figure is available at the British Journal of Cancer journal online.

We next identified genes that were significantly upregulated across all cancer types studied and defined a signature of nine such genes (Figure 1B–D). There was a significant correlation in the expression level of most of these genes in TCGA data sets of colon, gastric, lung and ovarian ACs (Supplementary Figure 1A and B), suggesting that the expression of these genes results from a common regulatory element.

Finally, immunohistochemistry demonstrated a significant increase in the expression of col11a1, col10a1 and spp1 proteins in colon cancer compared with matched normal colon tissues (Figure 1E). The expression of col1a1 was increased in cancer tissues compared with normal colon but this did not reach significance. Importantly, within colon cancers, each protein was identified within the stroma indicating deposition within the ECM. In normal colon, col11a1 was virtually undetectable and col10a1 was identified within the cytoplasm of colonic epithelial cells rather than within the stromal tissue compartment.

The nine-gene CM signature predicts long-term outcome in various cancer types

Given the widespread overexpression of the nine-gene CM signature in ACs compared to normal tissues, we hypothesised that the expression of these genes may be a requirement for cancer. Combined comparison of normalised gene expression data from multiple GSE data sets confirmed this hypothesis, as patients with cancers demonstrating overexpression of the nine-gene signature displayed reduced overall and progression-free survival for gastric, lung and ovarian cancers (Figure 2A). In three large colorectal cancer data sets, overexpression of the nine-gene signature was similarly associated with adverse outcome (Figure 2A) and it was also associated with reduced progression-free survival in a large TCGA breast cancer data set, as it was for cancers not used to generate the nine-gene signature such as those of the prostate and bladder (Supplementary Figure 2). Interestingly, there was no correlation between expression for the CM gene signature and survival in squamous cell carcinoma of the lung, head and neck or oesophagus or for oesophageal AC (Supplementary Figure 2). Multivariate analysis in gastric, lung and colorectal data sets identified the nine-gene CM signature as a factor significantly associated with disease-free survival independent of disease stage or grade (Table 2).

Figure 2
figure 2

Expression of the core matrisome gene signature predicts survival in various cancer types. (A) Overall survival (top row) and recurrence-free survival (bottom row) for cohorts of patients whose tumours demonstrate overexpression (red) or normal expression (blue) of the nine-gene CM gene signature. Numbers represent hazard ratios (95% confidence intervals). (B) GSEA analysis of colorectal and gastric cancer TCGA data sets analysed for EMT, angiogenesis, hypoxia, inflammation, oxidative phosphorylation, apoptotic regulation and genomic instability geneset enrichment in patients with high or normal expression of the nine-gene CM signature. NES, normalised enrichment score. A full colour version of this figure is available at the British Journal of Cancer journal online.

Table 2 Multivariate analysis of factors relevant for disease-free and overall survival

GSEA analysis identifies biological traits associated with expression of the CM signature

Epithelial–mesenchymal transition (EMT), angiogenesis, hypoxia, inflammation and glycolysis are all features of cancer that have been associated with poor prognosis. Several of these processes have also been linked to ECM deposition (Lu et al, 2012). To gain an insight into the biological mechanisms through which the CM gene signature may define poor prognosis cancers, we performed GSEA analysis to determine whether genes governing these processes are overrepresented in cancers from patients overexpressing the nine-gene CM signature. Strikingly, colorectal, gastric (both Figure 2B), lung and ovarian cancers (both Supplementary Figure 3) expressing the CM gene signature were significantly enriched in EMT, hypoxia, angiogenesis and inflammation genesets, but showed reduced expression of the oxidative phosphorylation geneset. Importantly, several molecular signatures defining other cancer-related processes, including those for apoptosis or genomic instability, were not enriched in cancers expressing the CM signature (Figure 2B and Supplementary Figure 3).

Discussion

Here we present a comprehensive analysis of the difference in expression of CM genes in cancers and normal tissues in order to identify key constituents of the cancer ECM. We have identified commonality in the significant upregulation of nine CM genes across multiple cancer types, suggesting a potential requirement for these CM genes throughout solid tumours. Expression of the nine-gene signature predicted outcome in a broad range of cancers including those not initially used to generate the gene signature. These proteins are therefore associated with cancer progression and their combination may represent a useful biomarker for prognostication. Interestingly, the CM signature failed to predict overall or disease-free survival in squamous cell cancer data sets indicating that the ECM genes in the CM signature are not of relevance for the progression of SCC.

Col11a1 has previously been linked to cancer progression (Fischer et al, 2001; Cheon et al, 2014; Jia et al, 2016; Li et al, 2017) and is expressed by cancer stromal (Galván et al, 2014; Jia et al, 2016) and tumour cells progressed to EMT, where it promotes migration and invasion (Sok et al, 2013; Wu et al, 2014). Expression of secreted phosphoprotein 1 (SPP1, osteopontin) is also reported in cancer (Shevde and Samant, 2014) and is driven by cancer-related signalling pathways including Hedgehog, Wnt/β-catenin and NFκB (Shevde and Samant, 2014). SPP1 is also expressed by tumour-associated macrophages and fibroblasts, where it is linked to angiogenesis (Kale et al, 2014) and the metastatic cascade (Mi et al, 2011), respectively.

The protein products of several genes in our signature have not been thoroughly studied in relation to cancer; however, several are associated with functions of relevance to cancer progression. BGN, for example, interacts with toll-like receptors on the surface of macrophages to promote the synthesis of TNFα and CCL2 (Schaefer et al, 2005; Moreth et al, 2010); both cytokines important for cancer (Balkwill, 2006; Lim et al, 2016). BGN and MXRA5 expression are promoted by the activity of TGFβ1 (Heegaard et al, 2004; Poveda et al, 2017), a key driver of the adverse stromal response in many cancers (Pickup et al, 2013) and COMP binds TGFβ1, enhancing its biological activity (Haudenschild et al, 2011).

In support of these findings, our GSEA analyses link the nine-gene signature to biological processes common to poor prognosis cancers, including EMT, angiogenesis, hypoxia, inflammation and a shift away from oxidative phosphorylation as a means of energy generation. Interestingly, we failed to demonstrate enrichment of gene signatures linked to other cancer-relevant processes including apoptotic regulation and genomic instability. The association with specific cancer-relevant gene signatures may implicate the nine-gene CM signature in their regulation. Nonetheless, it should be noted that the data presented here is only correlative and from its analysis we cannot provide a mechanistic link between the expression of ECM genes and specific biological processes. Moving forward, it will be important to confirm the prognostic relevance of the gene signature in prospective cancer cohorts and look to preclinical models to investigate potential mechanisms through which they might regulate cancer progression.