Cancer-associated Fibroblasts Are Associated With Poor Prognosis in Solid Type of Lung Adenocarcinoma: Machine Learning Approach

Cancer-associated �broblasts (CAFs) participate in critical processes in the tumor microenvironment, such as extracellular matrix remodeling, reciprocal signaling interactions with cancer cells and crosstalk with in�ltrating in�ammatory cells. However, the relationships between CAFs and survival are not well known in lung cancer. The aim of this study was to reveal the correlations of CAFs with survival rates, genetic alterations and immune activities. This study reviewed the histological features of 517 patients with lung adenocarcinoma from The Cancer Genome Atlas (TCGA) database. We performed gene set enrichment analysis (GSEA), network-based analysis and survival analysis based on CAFs in four histological types of lung adenocarcinoma: acinar, papillary, micropapillary and solid. We found four hallmark gene sets, the epithelial-mesenchymal transition, angiogenesis, hypoxia, and in�ammatory response gene sets, that were associated with the presence of CAFs. CAFs were associated with tumor proliferation, elevated memory CD4+ T cells and high CD274 (encoding PD-L1) expression. In the pathway analyses, CAFs were related to blood vessel remodeling, matrix organization, negative regulation of apoptosis and transforming growth factor-b signaling. In the survival analysis of each histological type, CAFs were associated with poor prognosis in the solid type. These results may contribute to the development of therapeutic strategies against lung adenocarcinoma cases in which CAFs are present.


Introduction
Lung cancer is a major cancer and the most common cause of cancer death in the world 1,2 .According to the National Comprehensive Cancer Network (NCCN) guidelines in oncology, early lung cancer requires surgical procedures, but advanced cancer is treated with systemic treatment. 3However, about half of the patients recur, usually within the rst year after starting treatment 4,5 .
Genetic mutation of cells can induce cancer development, but disease progression and treatment sensitivity are affected by nonmutant cells within the tumor microenvironment 6 .One type of nonmutant cell within the dense collagenous stroma is broblast-like cells, so-called cancer-associated broblasts (CAFs) 7 .CAFs can drive cancer metastasis through remodeling of the extracellular matrix (ECM) and the production of growth factors and can affect angiogenesis, and these effects in uence therapy response 8 .
Recently, there has been a growing appreciation of the ability of CAFs to modulate the immune system 6 .
Several studies have reported that survival rates are associated with CAF histological features in different types of malignancy.Previous studies have demonstrated that CAFs and desmoplastic reaction are predictive of poor prognosis in colorectal cancer 9 .Another study suggested that adipocyte-derived broblasts are correlated with poor survival and desmoplastic reaction in breast cancer 7 .However, another study reported that histological type, speci cally the desmoplastic type, is an independent predictor of favorable prognosis in colorectal cancer 10 .
Recently, molecular studies have utilized bioinformatic tools to nd the mechanisms of CAFs.CAFs are a different cell population in terms of origin and pathobiological roles and are derived mainly from mesenchymal stromal cells that are resident in or recruited by the cancer 11 .CAFs are located close to tumor cells and stromal components such as lymphocytes, neutrophils, plasma cells, endothelial cells and ECM 12 .Fibroblasts include CAFs as well as myo broblastic cells, quiescent broblastic cells and pericytic cells.The identi cation of broblasts within the cancer remains challenging due to the lack of speci c biomarkers for known and still unclear subtypes 12 .
In recent years, big data analytics and next-generation sequencing (NGS) have allowed the analyses of genetic biomarkers, the quanti cation of the several types of tumor-in ltrating lymphoid cells and the molecular pathway network-based integration of multiomics data [13][14][15] .Considering the multiple geneenvironment relationships of lung cancer, the clinicopathological application of gene expression data is di cult.For these reasons, we believe that analyses based on gene expression data should focus on identifying a simple, robust, and druggable biomarkers based on high-throughput experimental tools and bioinformatics to achieve accessible therapeutic strategies.The Cancer Genome Atlas (TCGA) has a big database, including digital pathologic slides, clinicopathological information, RNA sequencing, mutation, copy number variable and methylation data 13 .Moreover, the histological features reported in the TCGA database provide data on the presence of CAFs and the tumor microenvironment in lung cancer.
This study aimed to determine whether the presence of CAFs contributes to the clinical outcomes of lung cancer and to evaluate the prognostic value of CAFs 16 .We further aimed to nd the gene sets related to CAFs based on gene set enrichment analysis (GSEA) 14 and molecular pathway network analyses 17,18 .
The relationships between lymphoid cells and CAFs were analyzed 15 .Using the Catalog of Somatic Mutations in Cancer (COSMIC) and Genomics of Drug Sensitivity in Cancer (GDSC) databases, we performed drug sensitivity screening in lung cancer cells according to broblast activation protein-a (FAP) expression (as a marker associated with CAFs) 19,20 .

Patient selection
We obtained a total of 1,053 non-small-cell lung carcinoma (NSCLC) cases comprising 566 lung adenocarcinomas and 487 squamous cell carcinomas with known mRNA expression and mutation data from the TCGA database 13 .The analysis was performed on 517 cases containing both virtual histopathological slides and clinical data (from a total of 566 lung adenocarcinoma samples).

Cancer-associated broblasts
In this review, cells with both immature broblastic proliferation (proportion: > 10 %) at the tumor invasive front and high FAP gene expression were de ned as CAFs (Fig. 1A) 21,22 .To determine the optimal cutoff value for FAP expression, we generated receiver operating characteristic (ROC) curves comparing sensitivity versus 1-speci city.The cutoff value calculated by the ROC curve was used to evaluate the relationship between survival and FAP expression.On the basis of the ROC curve, FAP expression was classi ed as low (mRNA level ≤ 562.9965) or high (mRNA level > 562.9965).Of the 517 cases, CAFs were present in 101 cases (19.5%).

Gene set enrichment analysis and pathway-based network analysis based on TCGA data
To detect signi cant gene sets, GSEA (version 4.1.0)was performed with 31,117 gene sets in the Molecular Signatures Database (MSigDB version 7.2) from the Broad Institute at MIT 14 .Speci c gene sets (50 hallmark gene sets) were tested to determine which were associated with CAFs.For this analysis, 1,000 permutations were utilized to calculate the p values, and the permutation type was set to phenotype.Signi cant gene sets were those with the following characteristics: false discovery rate < 0.001; family wise-error rate ≤ 0.001; and p < 0.001.
We analyzed tumor-in ltrating lymphocytes using deep learning-based lymphocyte classi cation with convolutional neural networks in whole-slide image analysis and identi ed immune subtypes using CIBERSORT and Kallisto software and algorithms 15,[23][24][25] .
Pathway-based network analysis was based on the identi ed genes linked to CAFs using Cytoscape (version 3.8.1)network visualization software.To visualize the biological relevance of the histological subtypes and their relevant elements on the basis of GSEA, we performed functional enrichment analyses using ClueGO, an application within Cytoscape software 17,18 .

Machine Learning algorithm for validation
We integrated CAFs with clinical risk factors (T stage, N stage, age, sex, smoking history) to composite prognostic models for survival prediction by applying machine learning (ML) algorithms in 517 cases (randomization: train set, 70%; validation set, 30%).A learning algorithm was independently applied to select and combine multiple covariates from gradient boosting machines (GBM) based on multivariate Gaussian models.In this step, ''forward" search method, which initiates on a prototype set and selects a feature if and only if the addition of the feature could increase the performance of the prognostic model, is adopted to select optimal features sequentially.Hyperparameters of the ML algorithms, such as learning rate in GBM were optimized for each combination of selected covariates and learning algorithm by grid search cross-validation through a prede ned range.We searched across 81 models with varying learning rates and tree depth.The nal optimal models were trained based on the selected covariates and the optimized hyperparameters 26 .To explore the performances of the GBM method, the receiver operator characteristic (ROC) curve was used.
Data extraction from the GDSC and COSMIC databases Drug screening was performed using datasets from the GDSC and COSMIC databases, which are largescale cancer cell line and drug response databases containing data from 987 cancer cell lines and 365 compounds, respectively.The response of 8 lung cancer cell lines to 316 drugs was measured and reported as the natural log half-maximal inhibitory concentration (LN IC50) value.A drug was de ned as a sensitive drug when the LN IC50 value was lower in lung cell lines with high FAP expression than in those with low FAP expression.

Statistical analysis
Student's t-test was used to evaluate the differences or relationships among continuous paramters.Disease-free survival (DFS) and disease-speci c survival (DSS) were compared using the log rank test.Multivariate analysis was performed to identify independent prognostic markers for DFS and DSS using a Cox multistep regression model.All data were analyzed using R packages.A two-tailed p value < 0.05 was considered statistically signi cant.

Results
Gene set enrichment analysis of cancer-associated broblasts in lung adenocarcinoma FAP was highly expressed in tumors compared with normal tissues (p < 0.001) (Fig. 1B).We performed GSEA to identify various gene sets associated with CAFs.In the analyses of hallmark gene sets, we found four gene sets (the epithelial-mesenchymal transition, angiogenesis, hypoxia and in ammatory response gene sets) that were associated with lung adenocarcinoma (Fig. 2A).
On the basis of the GSEA results, we analyzed the association between CAFs and each gene set-related marker.Vimentin, a biomarker related to epithelial-mesenchymal transition, was highly expressed in the presence of CAFs (p < 0.001).Vascular endothelial growth factor-A (VEGF-A), as a marker related to angiogenesis, was increased in the presence of CAFs (p = 0.022).Hypoxia-inducible factor 1 subunit alpha (HIF1a), which is linked to hypoxia, was elevated in the presence of CAFs (p < 0.001).The lymphocyte in ltration signature score, which is associated with the in ammatory response, showed a tendency to increase in the presence of CAFs, but it was not statistically signi cant (p = 0.209) (Fig. 2B).

Immune pro les, proliferation and biomarkers of cancer-associated broblasts
In the analyses of CAFs, we referred to the immune cell pro les, tumor cell proliferation parameters and biomarkers used in a study by Thorsson et al. and in silico cytometry. 24 comparing the immune cell fractions between samples with and without CAFs, memory B cells were decreased in samples with CAFs (p < 0.001), while activated memory CD4+ T cells were increased in samples with CAFs (p = 0.002).CD274 (encoding PD-L1, programmed death-ligand 1) expression was more elevated in samples with CAFs than in those without CAFs (p < 0.001).CD8+ T cells showed a tendency to be decreased in samples with CAFs, but this trend was not statistically signi cant (p = 0.602) (Fig. 3A).
Pathway-based network analysis of cancer-associated broblast-related genes and gene sets We performed pathway-based network analysis using the genes and gene sets associated with CAFs.The CAFs were linked to 10 functionally enriched Gene Ontology (GO) terms and pathways: 1) blood vessel remodeling; 2) lung brosis; 3) regulation of tissue remodeling; 4) extracellular matrix organization; 5) cell-matrix adhesion; 6) brillar collagen trimer; 7) negative regulation of extrinsic apoptosis signaling pathway; 8) collagen bril organization; 9) morphogenesis of an epithelial sheet; and 10) TGF-b receptor signaling (Fig. 4).

Survival analyses of cancer-associated broblasts and machine learning
In the TCGA data, the distributions of the ve predominant histological types were as follows: 7 lepidic type (1.4%), 115 acinar type (22.2%), 107 papillary type (20.7%), 65 micropapillary type (12.6%) and 223 solid type (43.1%).There was an absence of CAFs in the lepidic cases; thus, they were not included in the survival analyses.
The presence of CAFs was associated with unfavorable DFS and DSS in the acinar type (p = 0.039 and 0.067, respectively), but the relationship between CAFs and DSS was not statistically signi cant.The presence of CAFs was signi cantly related to shorter DFS and DSS than the absence of CAFs in the papillary type (p < 0.001 and 0.019, respectively).The presence of CAFs correlated with worse DFS and DSS in the micropapillary type (p = 0.47 and 0.069, respectively), but the correlations were not statistically signi cant.In the solid type, the presence of CAFs was signi cantly associated with shorter DFS and DSS than the absence of CAFs (p = 0.007 and 0.002, respectively) (Fig. 5A and B).After adjustment for confounders including T stage, N stage, age, sex and smoking history, the presence of CAFs was associated with worse DFS in the papillary type and solid type than the absence of CAFs (p < 0.001 and = 0.008, respectively).There was a relationship between shorter DSS and the presence of CAFs in only solid type samples (p = 0.003) (Table 1).
We compared the performance of the two GBM models in predicting survival rates (Model 1; T stage, N stage, age, sex, smoking history versus Model 2; CAFs, T stage, N stage, age, sex, smoking history).ROC curves were performed (area under the curve: Model 1, 0.642; Model 2, 0.677) (Fig. 5C).We found that the GBM algorithm performed the best while the addition of CAFs to the prediction model improved the prognostic performance.With cross-validation estimates, 15 decision trees were utilized sequentially while the maximum depth of each decision tree was optimized at 1, corresponding to one-way interactions, and the learning rate was optimized at 0.018.

Discussion
This study showed survival differences between patients with and without CAFs and analyzed genetic/molecular alterations in patients with lung adenocarcinoma.In previous studies, genetic/molecular signatures related to CAFs have been shown to correlate with prognosis in colorectal, ovarian and breast cancer [27][28][29] .Our results revealed that the presence of CAFs was associated with a shorter survival rate than the absence of CAFs in lung adenocarcinoma, especially the solid type.In this study, the machine learning model analysis which includes CAFs increased the accuracy of predicting the survival rate.A study by Marcela et al. reported that CAFs were related to increased survival in patients with diffuse large B cell lymphoma 30 .Moreover, another study of colorectal carcinoma demonstrated that the presence of desmoplasia and CAFs was associated with better survival than the absence of desmoplasia 10 .Thus, there is controversy regarding the association between CAFs and the survival of patients with cancer.It is thought that CAFs in the tumor microenvironment are phenotypically heterogeneous and may exhibit both a protumorigenic and antitumorigenic phenotypes 31 .We analyzed hallmark gene sets related to CAFs in lung adenocarcinoma.A total of four gene sets associated with the presence of CAFs were identi ed: the epithelial-mesenchymal transition (EMT), angiogenesis, hypoxia, and in ammatory response gene sets.Subsequently, we determined the correlations of representative biomarkers associated with these gene sets with to the presence or absence of CAFs.First, vimentin is an EMT biomarker and is involved in cell migration, motility and adhesion and associated with metastasis 32 .Second, VEGF-A is an angiogenesis biomarker and induces high microvascular density and permeability and promotes tumor expansion 33 .Third, HIF1a is a hypoxia biomarker and is associated with the upregulation of glycolytic genes related to oxygen deprivation with increased cancer metabolism 34 .Fourth, the lymphocyte in ltration signature score, which is an in ammatory response marker, is related to prognosis and host-tumor immune interactions in different types of malignancies 35 .Some representative markers, such as vimentin, VEGF-A and HIF1a, were elevated in the presence of CAFs compared to the absence of CAFs.There was no signi cant difference in the lymphocyte in ltration signature score between samples with and without CAFs.These results suggest that the presence or absence of CAFs has minimal effect on the host-tumor immune response in lung adenocarcinoma.
In addition, we analyzed the changes in immune cell subtypes according to the presence or absence of CAFs by considering the characteristics of different immune cells.Memory B cells were decreased in the presence of CAFs, but activated memory CD4+ T cells were increased in the presence of CAFs.CD274 was elevated in the presence of CAFs.There was no signi cant difference in CD8+ T cells between samples with or without CAFs.A study by Costa et al. demonstrated that FAP-high broblasts, such as CAFs, are correlated with T reg cell-mediated immunosuppression and poor outcome in breast cancer 36 .
Our results showed that, in lung adenocarcinoma, CAFs have little effect on the immunomodulation associated with CD8+ T cells in the tumor microenvironment.
CAFs can induce increased levels of growth factors, matrix remodeling and increased levels of numerous cytokines related to immunomodulation 6 .In our results, the proliferation index was increased in the presence of CAFs.TGF-b and IL-6 are related to tumor growth and/or immunosuppression and were increased in the presence of CAFs.A representative marker of ECM and cancer invasion, MMP-11, was elevated in the presence of CAFs.The pathway-based network analysis showed biological functions related to CAFs, such as blood vessel remodeling, extracellular matrix organization, negative regulation of the extrinsic apoptotic signaling pathway and TGF-b receptor signaling.
We investigated 316 anticancer drugs to which 8 breast cancer cell lines with high FAP expression were responsive by using pharmacogenomic screens from the GDSC database 20 .We identi ed three anticancer drugs that effectively reduced the growth of lung cancer cells with high FAP expression.Among them, dabrafenib, which showed a high negative correlation between high FAP expression and a promising LN IC50 value, was identi ed as the most sensitive anti-CAF drug for the lung cancer cell lines assessed.The combination of dabrafenib, as a BRAF kinase inhibitor, and trametinib, an inhibitor of mitogen-activated extracellular signal-regulated kinase (MEK), is currently being investigated in advanced malignant tumors 37 .A previous study demonstrated that trametinib produced a potential tumor response against protumorigenic CAFs in neuroblastoma 38 .In the GDSC analysis, trametinib showed a tendency to inhibit lung cancer cells with high FAP expression, but the trend was not statistically signi cant [(r = -0.666,p = 0.103 (Pearson's correlation) and 0.423 (Student's t-test)].Given the lack of vivo studies, dabrafenib-based clinical trials in lung cancer patients with high FAP expression are needed in the future.
This study has several limitations that should be acknowledged.First, because this is a cross-sectional study and the in silico analyses with TCGA did not show sustained relationships over time, it is di cult to reach a de nitive conclusion.Second, experimental data allowing for novel biological insights into CAFs were not obtained in our study.Further in vitro and/or in vivo studies may be necessary to clarify the molecular mechanisms of CAFs in solid lung adenocarcinoma.Third, CAF function may be highly heterogeneous in solid lung adenocarcinoma patients, as many components of signaling pathways are affected by disease status, microenvironment, and immunity.Fourth, the di culty in identifying CAFs results largely from the lack of unique markers 6 .In our study, CAFs were de ned by a combination of the histological features of broblasts and high FAP gene expression.
This study demonstrated that CAFs are associated with increased tumor cell growth, angiogenesis, and ECM remodeling, effects that produce an unfavorable prognosis in patients with solid lung adenocarcinoma.CAFs were found to be associated with enhanced recruitment of activated memory CD4+ T cells with high CD274 expression.The presence of CAFs was related to decreased numbers of CD8+ T cells, but the relationship was not statistically signi cant.Patients with CAFs with high CD274 expression without elevated CD8+ T cells might develop resistance to anti-PD-L1 therapies.Our work ow results regarding CAFs will contribute to designing future clinical and experimental studies for patients with solid lung adenocarcinoma.

Figure 1 please
Figure 1

Figure 3 please
Figure 3

Table 1 .
Univariate and multivariate analyses of disease-free survival and disease-speci c survival based *Adjusted for T stage, N stage, age, sex and smoking history