Identification of molecular signatures associated with early relapse after complete resection of lung adenocarcinomas

The only potentially curative treatment for lung adenocarcinoma patients remains complete resection of early-stage tumors. However, many patients develop recurrence and die of their disease despite curative surgery. Underlying mechanisms leading to establishment of systemic disease after complete resection are mostly unknown. We therefore aimed at identifying molecular signatures of resected lung adenocarcinomas associated with the risk of an early relapse. The study comprised 89 patients with totally resected stage IA–IIIA lung adenocarcinomas. Patients suffering from an early relapse within two years after surgery were compared to patients without a relapse in two years. Patients were clinically and molecular pathologically characterized. Tumor tissues were immunohistochemically analyzed for the expression of Ki67, CD45, CD4, CD8, PD1, PD-L1, PD-L2 and CD34, by Nanostring nCounter PanCancer Immune Profiling Panel as well as a comprehensive methylome profiling using the Infinium MethylationEPIC BeadChip. We detected differential DNA methylation patterns as well as significantly differentially expressed genes associated with an early relapse after complete resection. Especially, CD1A was identified as a potential biomarker, whose reduced expression is associated with an early relapse. These findings might help to develop biomarkers improving risk assessment and patient selection for adjuvant therapy as well as establish novel targeted therapeutic strategies.

www.nature.com/scientificreports/ a certain amount of risk and thus an accurate selection of those patients who are eligible for complete resection needs to be achieved 4 . There seems to be a peak in post-surgical recurrence risk within the first year after resection, whereas other patients present with quite long relapse-free survival 2 . Therefore, we compared molecular features of cases with an early relapse ≤ 2 years after curative surgical resection with a group of control cases without a relapse within 2 years after surgery. We aimed at identifying biomarkers associated with a high risk of recurrence in order to improve individualized pre-and post-surgical patient management.

Results
Clinical characterization. 98 tumor tissues of resected pulmonary adenocarcinomas were available. 49 of these patients developed early relapse (relapse ≤ 2 years after surgical resection). 40 patients did not develop relapse within 2 years after surgery (late relapse). For n = 9 patients, follow-up period was too short for classification in one of the two groups and the cases were therefore excluded from further analyses. Descriptive data of the 89 remaining patients including tumor diagnostics, typical confounders, overall survival and relapse-free survival are given in Table 1.
Histological and immunohistochemical characterization. In the H&E stainings the different growth patterns lepidic, acinar, papillary, solid, micropapillary as well as combinations of them were seen. Due to the multitude of different pattern constellations and therefore small patient groups a correlation with the clinical outcome could not be drawn for this analysis.
Employing Ki-67 staining proliferation index of the tumors was determined. Density of vascularization was determined through CD34 analysis. Additionally, CD34 stainings were examined for organized vascularization Table 1. Clinical characterization (TNM staging, confounder and survival).

Category
Early relapse (n = 49) Late relapse (n = 40) Stage IA 7 9 Stage IB 10 5 Stage IIA 14 11 Stage IIB 7 8 Stage IIIA 11 7 V0: without vascular invasion 31  Correlation with the patients' clinical outcome was examined for all immunohistochemically analyzed markers and revealed no association with an early relapse or a long relapse-free survival (data not shown).

Molecular pathological characterization. FISH analyses revealed no rearrangement of ALK, ROS1 and
RET gene loci in any of the examined cases. Additionally, amplification of the PDL1 gene was proven only in a single case (Table 2). Therefore, no data integration was performed for complete FISH analyses.
Mutational status of EGFR exon 18, 19, 20 and 21 as well as KRAS exon 2 and 3 were determined for all cases using Sanger sequencing of the corresponding gene sections. In total, fractions of 43% KRAS mutated and 15% EGFR mutated cases were present (Table 2). We observed a tendency for an association of a longer relapse-free survival with the presence of an EGFR mutation. Due to the respectively small case numbers per group this correlation could not be proven with regression models. DNA methylation analyses. A differential methylation analysis (DMA) to identify loci differentially methylated in short-term and long-term relapse-free survivors identified 420 CpG loci (p < 4 × 10 −4 , σ/σ max > 0.3) corresponding to 219 genes. 169 CpG loci were not allocated to a distinct gene (Supplementary Table S1). A hierarchical cluster analysis of methylation data based on these loci resulted essentially in three major branches (Fig. 1a), one preferentially containing cases with early relapses (96%), one carrying mainly late relapses (88%) and one mixed arm (71% early relapse), which argues for diverse DNA methylation patterns in the two groups of patients. A gene ontology analysis using the GOrilla-Tool 11,12 indicated a significant enrichment of genes involved in signaling (GO:0023052, FDR < 7.34 × 10 −3 ) in the set of 219 differentially methylated genes.
A subsequently performed string analysis (string-db.org) revealed numerous putative interactions between the 219 gene products (Supplementary Figure S1 and Supplementary Table S2). Putative interactions have been found in particular for components of signaling pathways (e.g. SHANK2, GRIK5, NRXN3).
Most aberrantly methylated genes were represented by a single CpG locus only. However, six CpG loci were located in CACNA1C, five in LDLRAD4, four in ADAMTS17, and three in COL23A1, NTRK2, PHACTR1, ANKS1B and APCDD1L-AS1 each (Supplementary Table S3).
In a second approach, three machine learning algorithms (support vector machine, SVM, K-nearest neighbor, KNN, and Random Tree, RDT) were applied to build a classifier for differentiating short-term from long-term relapse-free survivors (Fig. 1b). All classifiers sorted the samples into the particular categories; nevertheless 100% discrimination could not be reached. The best performing one of these algorithms (Random Tree) included ten CpG loci (cg20630582, cg23856707, cg01241784, cg10585962, cg14675427, cg10717121, cg14291751, cg10096084, cg04761746, cg05864261) and showed an accuracy of 0.71 in cases with an early relapse and 0.51 for cases with late relapse. In the epigenetic age of the tumors (difference between chronological age and DNAmAge) of long-term and short-term relapse-free survivors no significant differences were detected.
Information on the respective tumor grade was included in the heat maps ( Fig. 1a,b) illustrating that clustering based on methylome profiling is not just reproducing tumor grades but provides additional value for individual risk stratification of patients.  www.nature.com/scientificreports/   www.nature.com/scientificreports/ Normalized nanoString mRNA expression data was analyzed in a cluster analysis. As shown in Fig. 2a no underlying global differential expression pattern between the two patient cohorts was identified. Even in random forest and regression analyses as well as after shifting the cut-off no classification into the two clinical groups based on the mRNA expression patterns could be achieved (data not shown).
Although no global expression pattern differentiating between the two patient cohorts could be seen, individual significantly differentially expressed candidate genes between the patient cohorts (p value < 0.05 und foldchange > 2) could be identified (Fig. 2b). According to that an early relapse within two years after R0 resection was associated with a low expression of CD1A, CD207, FCER1A and a high expression of PRAME. This differential expression is not accompanied with differential DNA methylation of corresponding gene loci. The respective correlation of mRNA expression of the four identified candidate genes with clinical outcome of the patients was examined using Kaplan-Meier estimator (Fig. 3). Especially for CD1A an association of low mRNA expression with a worse clinical outcome, meaning the occurrence of an early relapse, could be shown.
The detected correlation between CD1A mRNA expression and clinical outcome was confirmed on protein level by immunohistochemistry. This validation revealed that an early relapse is much more frequent in CD1A negative than in positive cases (Fig. 4). Regression analysis using chi square test confirmed a highly significant correlation between loss of CD1A protein expression (< 1%) and occurrence of an early post-surgical relapse (p = 0.0226 for tumor cells and p = 0.0012 for immune cells). As the early relapse group is enriched for high grade and undifferentiated tumors (Table 1), we also performed chi-square test for correlation analysis between CD1A protein expression and tumor grading. This analysis revealed that immunohistochemical detection of CD1A expression in tumor as well as immune cells respectively is independent from grading as G1/G2 vs. G3/ G4 tumor (p > 0.05).

Discussion
A lot of patients suffer from early post-surgical recurrence after complete resection of early-stage lung adenocarcinomas, whereas other patients present with quite long relapse-free survival. The underlying molecular mechanisms differentiating these two patient cohorts are mostly unknown. Therefore, we compared molecular features of cases with an early relapse ≤ 2 years after curative surgical resection with a group of control cases without a relapse within 2 years after surgery.
In this study no association with an early relapse or a long relapse-free survival could be seen for any of the analyzed histological and immunohistochemical markers. This result is in line with another early biomarker study failing to show any significant association between survival and the expression of a biomarker panel, including www.nature.com/scientificreports/ angiogenesis markers 13 . In contrast, in a study on the prognostic relevance of angiogenesis in stage III NSCLC by Kreuter et al. microvessel density was a prognostic factor in the subgroup of R0-resected stage IIIA NSCLC patients 14 . In this small subgroup our data also indicate a tendency of less tumor vascularization in the earlyrelapse patient group. Also, the Ki-67 proliferation index was shown to predict the postoperative recurrence in early stage lung adenocarcinoma patients in previous studies 15,16 . This association is also visible in our data when comparing only stage I tumors. Although previous studies have demonstrated that the density of various immune cells in the tumor stromal compartment has significant prognostic relevance for NSCLC survival 17,18 , the herein presented data do not indicate a prognostic value of expression of immune checkpoint markers for post-surgical outcome of R0-resected lung adenocarcinoma patients. In our cohort we detected fractions of 43% KRAS mutated and 15% EGFR mutated cases, although being slightly enriched for KRAS mutations, reflecting a typical distribution for a lung adenocarcinoma cohort 19 . We observed a tendency for an association of a longer relapse-free survival with the presence of an EGFR mutation. Due to the respectively small case numbers per group this correlation could not be proven with regression models. For stage III NSCLC with R0 resection it has already been shown that EGFR mutational status is more likely to be a predictive marker for response to treatment with tyrosine kinase inhibitors than a prognostic marker for post-surgical outcome 20 .
The differential methylation analysis (DMA) identified a set of 219 differentially methylated genes between the two patient groups. High Glutathione S-transferase P1 (GSTP1) DNA methylation was shown to be associated with a bad prognosis in NSCLC patients 21 . However, in the present study GSTP1 is not among the 219 differentially methylated genes. A string analysis revealed numerous putative interactions between the 219 gene products. These included proteins with a putative role in carcinogenesis, including lung cancer (e.g. CACNA1C 22 , FGFR2 23 , TLR9 24 and CEACAMs 25 ).
Among the differentially methylated genes CACNA1C was represented by six CpG loci, five loci were located in LDLRAD4, four in ADAMTS17, and three in COL23A1, NTRK2, PHACTR1, ANKS1B and APCDD1L-AS1 each. All of these genes have been reported playing a role in malignancies in humans, including lung cancer. CACNA1C encodes a calcium channel (calcium voltage-gated channel subunit alpha1 C). A copy number gain has been described in squamous cell carcinomas of the esophagus. Genetic alterations in this gene were also shown in lung adenocarcinoma indicating a putative function of CACNA1C in lung cancer 22 . The low-density lipoprotein receptor class A domain-containing protein 4 (LDLRAD4) has been described functioning as negative regulator of TGF-beta signaling by binding to SMAD2 and SMAD3 and therefore attenuating SMAD-recruitment to the TGF-beta receptor 26 . Liu et al. showed that LDLRAD4 also interacts with the ubiquitin ligase NEDD4 to affect TGF-beta signaling and promotes cell proliferation and migration 27 . Expression of ADAMTS17 has been www.nature.com/scientificreports/ found in fetal lung tissue as well as different adult normal tissues 28 . While Collagen XXIII expression has already been suggested as potential biomarker for NSCLC, APCDD1L-AS1 expression is of prognostic value in squamous cell carcinoma and NTRK2 has recently been reported as therapeutic target in combination with tyrosine kinase inhibitors in this tumor entity [29][30][31] . Also, for PHACTR1 and ANKS1B potential roles in lung cancer have been described before 32,33 . Interestingly, also two carcinoembryonic antigen genes (CEACAM6 and CEACAM7) showed differential DNA methylation patterns. The best performing classifier for differentiating short-term from long-term relapse-free survivors used the Random Tree machine learning algorithm. It included ten CpG loci and showed an accuracy of 0.71 in cases with an early relapse and 0.51 for cases with late relapse. These results show the possibility to stratify the patients and samples according to their DNA methylation pattern and this stratification provides additional information to the existing grading system.
A disadvantage of this pilot study is the limited number of samples. Due to this fact, individual epigenetic alterations in single patients can appreciably affect the results. Furthermore, the applied algorithms required the classification of the donors into short-term and long-term relapse-free survivors. This classification was based on clinical experiences but not on biological or molecular criteria. Therefore, the classification was-to some extent-arbitrary. Consequently, individual differences in the constitution of the patients in the timespan until the relapse might have affected the outcome (in particular, if the 2-year limit has been barely reached or has been just missed).
A further validation of the DNA methylation-based outcome of this study, in particular in the context of supplementary gene expression data (either array or RNA-seq based) would be favorable to link the epigenetic to gene regulation and functional data. mRNA expression analyses using the nCounter PanCancer Immune Profiling Panel revealed no underlying global differential expression pattern between the two patient cohorts. Nevertheless, individual significantly differentially expressed genes were identified. Especially for CD1A an association of low mRNA expression with a worse clinical outcome, meaning the occurrence of an early relapse within two years after R0 resection, could be shown. This association was also validated on protein level. CD1A protein expression in tumor as well as immune cells was independent from corresponding tumor grading but highly correlated with the occurrence of an early post-surgical relapse.
For the first time we could identify CD1A as a potential biomarker, whose reduced expression both on tumor and on tumor-infiltrating immune cells is associated with an early relapse after R0 resection of lung adenocarcinomas. CD1A is an MHC class I-like molecule capable of presenting lipid antigens and known to be expressed on antigen-presenting cells, e.g. dendritic cells. Lipid antigens were shown to be present in cancer cells and lipid-specific T cells play a role in anti-tumoral immune responses 34 . CD1A positive dendritic cell activity is regarded as one of the first steps in this immune response 35 . Additionally, the presence of a high number of antigen-presenting dendritic cells in the tumor was shown to be associated with increased disease-specific survival in resected non-small cell lung cancer 36 .
Given the fact that CD1A immunohistochemistry is already performed routinely in many institutes of pathology in order to detect CD1A expressing Langerhans cells, implementation of CD1A staining in routine diagnostics for resected early-stage lung adenocarcinoma is easily feasible. Therefore, loss of CD1A expression represents a novel potential biomarker to identify patients with an increased risk of post-surgical relapse that might benefit from adjuvant therapy. In addition to that CD1A expression might also be relevant in the context of immunotherapeutic approaches in lung cancer.

Methods
Study design and patient selection. The study was conducted at a single institution as retrospective, non-interventional case control study. It was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the ethics committee of the University Luebeck (14-043).
Initially, 235 patients with totally resected NSCLC adenocarcinomas stage I-IIIA (residual status: R0) between 2012 and 2014 were identified in the hospital information system of the LungenClinic Grosshansdorf. 179 patients signed the corresponding informed consent form allowing analysis of clinical data and tumor tissue. 100 of these patients were selected for follow-up. In total, 51 patients with early relapse ≤ 2 years after curative surgical resection have been identified (case group). In addition, 40 patients have been identified without a relapse within 2 years after curative surgical resection (control group). 9 patients were lost to follow-up. Patients were clinically and pathologically characterized (Table 1) www.nature.com/scientificreports/ Read-out for Ki-67 staining was the fraction of positive tumor cells. Using CD34 staining the number of vessels in 3 high power fields was determined. CD45 positive tumor infiltrating lymphocytes were quantified as fraction of the tumor area and by CD4 and CD8 marking the corresponding fraction of infiltrating lymphocytes were determined. Expression of the immune checkpoint markers PD-1, PD-L1 and PD-L2 was quantified as corresponding contribution of positive immune cells to the tumor area as well as the fraction of positive tumor cells (PD-L1 and PD-L2) or immune cells (PD1) respectively. CD1A protein expression was quantified as contribution of positive immune cells to the tumor area as well as fraction of positive tumor cells. To group the cases according to obtained expression values they were scored by arbitrary cut-offs as strong positive (≥ 10%), weak positive (≥ 1%, < 10%) or negative (< 1%).

DNA methylation analyses.
For stratification of the two clinical patient groups based on the DNA methylation pattern of tumor cells, a differential methylation analysis was performed by using the Infinium MethylationEPIC BeadChip (Illumina, San Diego CA, U.S.A.). Array hybridization was performed according to manufacturer's instruction; raw data collected by the iScan device was analyzed using Illumina's GenomeStudio software applying standard settings. Loci with detection.pvalue>0.01 as well as loci located on gonosomes were excluded from further analysis. Finally, data from 49 short-term relapse-free survivors as well as 39 long-term relapse-free survivors could be included into this study (608,811 loci each).
Further in-depth analyses were performed by using the Qlucore OMICS Explorer software (Qlucore, Lund, Sweden) as well as R-(www.r-proje ct. org) and Perl-scripts (www. perl. org). CpG loci with p<4x10 −4 and a relative variance (σ/σ max ) >0.3 were considered differentially methylated. Machine learning algorithms (k-nearest neighbor, support vector machine and random tree) were applied to build classifiers for short-and long-term survivors. The GOrilla tool (http:// cbl-goril la. cs. techn ion. ac. il/) and STRING (www. string-db. org) were used for performing gene ontology analysis or interaction predictions, respectively (accessed in May 2019). The data set is available under GSE132690. Due to in some cases low RNA concentrations 60 tumor samples could be examined by nanoString technology (RNA input: 45 ng). From both patient cohorts 30 patients were analyzed each. Readout of the cartridges was performed in the Institute for Pathology, Hannover Medical School. Obtained expression data has been normalized using the NanoStringNorm package for the R programming language with default parameters, following the suggested protocol. Code counts have been normalized by geometric mean, while background has been reduced with 2 standard deviations. From the normalized nanoString mRNA data genes with differential expression between the two patient cohorts were identified (p value <0.05 and foldchange >2). Cluster analysis as well as random forest and regression analyses were performed in order to detect global differential expression patterns. A correlation of the mRNA expression of identified differentially expressed candidate genes with the clinical outcome was examined using Kaplan-Meier estimator.

Data availability
Upon publication the methylome data set generated and analyzed during the current study will be openly available in Gene Expression Omnibus (GEO) at https:// www. ncbi. nlm. nih. gov/ geo/, reference number GSE132690. Other datasets generated and analyzed during the current study are available from the corresponding author on reasonable request. www.nature.com/scientificreports/