Assessment of copy number in protooncogenes are predictive of poor survival in advanced gastric cancer

The copy number (CN) gain of protooncogenes is a frequent finding in gastric carcinoma (GC), but its prognostic implication remains elusive. The study aimed to characterize the clinicopathological features, including prognosis, of GCs with copy number gains in multiple protooncogenes. Three hundred thirty-three patients with advanced GC were analyzed for their gene ratios in EGFR, GATA6, IGF2, and SETDB1 using droplet dPCR (ddPCR) for an accurate assessment of CN changes in target genes. The number of GC patients with 3 or more genes with CN gain was 16 (4.8%). Compared with the GCs with 2 or less genes with CN gain, the GCs with 3 or more CN gains displayed more frequent venous invasion, a lower density of tumor-infiltrating lymphocytes, and lower methylation levels of L1 or SAT-alpha. Microsatellite instability-high tumors or Epstein–Barr virus-positive tumors were not found in the GCs with 3 or more genes with CN gain. Patients of this groups also showed the worst clinical outcomes for both overall survival and recurrence-free survival, which was persistent in the multivariate survival analyses. Our findings suggest that the ddPCR-based detection of multiple CN gain of protooncogenes might help to identify a subset of patients with poor prognosis.

Gastric carcinoma (GC) is one of the most common malignancies in Eastern Asia and one of the leading causes of cancer-related deaths. TNM cancer staging provides prognostic information, but clinical outcomes vary among patients with GC of the same cancer stage. For patients with GC of the same cancer stage, further prognostic information could be gained from biomarkers including pathological parameters, such as lymphovascular invasion, perineural invasion, and tumor-infiltrating lymphocytes (TILs). Molecular markers might provide information about the prognostic features of the tumor. The Cancer Genome Atlas (TCGA) project has defined four subtypes of GC, including GCs with microsatellite instability (MSI), Epstein-Barr virus (EBV), genomic stability (GS), and chromosomal instability (CIN), which have been associated with different prognoses 1,2 . The EBV subtype was associated with the best prognosis, while the GS subtype was associated with the worst prognosis. Although the CIN subtype fell in between the above two subtypes, it demonstrated the greatest survival benefit from adjuvant chemotherapy 1 , which indicates that the molecular subtyping of GCs might provide prognostic and predictive value.
CIN consists of numerical and/or structural aberrations in chromosomes. Numerical abnormality refers to the gain or loss of whole chromosomes, whereas structural abnormalities include the amplification, loss, translocation, and inversion of chromosomal regions of various sizes ranging from a single gene to an arm. Through the TCGA project, many genes have been found to be amplified or undergo copy number gain, including EGFR, FGFR1, GATA6, HER2 (ERBB2), IGF2, MYC, and SETDB1 in GCs 2 . Although copy number gains of these genes are expected to occur mainly in the CIN subtype of GC, the prognostic significance of the copy number gains of these genes has not yet been clarified.
Digital polymerase chain reaction (dPCR) is a method that provides quantitative information about copy number changes in probed genes without the need for standard curves. DNA samples obtained from formalinfixed archival tissues contain inhibitors for PCR and formalin-induced interstrand crosslinking, which can result in errors in the analysis of copy number variation by quantitative PCR (qPCR). However, dPCR can provide Scientific Reports | (2021) 11:12117 | https://doi.org/10.1038/s41598-021-91652-y www.nature.com/scientificreports/ more accurate results because it does not use the comparison of PCR rates relied on by qPCR but instead uses the determination of whether amplification above a threshold has occurred. In the present study, we aimed to elucidate whether copy number changes in seven genes (EGFR, FGFR1, GATA6, HER2, IGF2, MYC, and SETDB1) are related to the survival of patients with advanced GC and might serve to detect a subset of GC cases with poor prognosis. The genes included in this study belong to those which are most frequently amplified in TCGA. We used droplet dPCR (ddPCR) to evaluate the copy number changes of the seven genes in formalin-fixed, paraffin embedded tissue samples of advanced GC. Pyrosequencing methylation assay of L1 and SAT-alpha. After bisulfite modification of the extracted DNA, the modified DNA was subjected to pyrosequencing methylation assays of L1 and SAT-alpha. The detailed procedures and determination of methylation levels were described in a previous study 4 . Statistical analysis. Statistical analyses were performed using SPSS software for Windows, version 25.0 (IBM, Armonk, NY, USA). Two-sided P-values of less than 0.05 were considered statistically significant. To identify whether the gene ratios were normally distributed in GC tissue samples, normalization tests were performed for the gene ratios, which revealed that the gene ratios were not normally distributed. The mean values of the gene ratios across two groups or across three or more groups were compared using both Student's t-test and the Mann-Whitney test and both ANOVA and the Kruskal-Wallis test, respectively. The clinical outcome data were last updated in December 2019. Of the included 333 patients, 14 patients were lost to follow-up. Recurrencefree survival (RFS) was measured from the date of surgery for advanced GC to the date of the first documented recurrence or the date of death from any cause, whichever occurred first. Overall survival (OS) was calculated from the date of resection to the date of death from any cause or the last clinical follow-up time. Survival curves were assessed using the Kaplan-Meier method and the log rank test. Multivariate comparisons of survival rates were performed with the Cox proportional hazards regression model, and baseline characteristics were adjusted using a backward stepwise regression model including covariates of prognostic value.

Results
A total of 333 advanced GC patients were analyzed for their gene ratios using ddPCR. The demographic findings are summarized in Supplementary   Table 4). To identify whether the gene ratios were normally distributed in the GC tissue samples, a normalization test was performed using the Shapiro-Wilk test, which showed that the gene ratios were non-normally distributed.
Gene ratios and survival. For the survival analysis, the GC patients were grouped into 10 equal-sized subsets (i.e., each group has approximately the same number of patients), from subset 1 to subset 10, according to the increasing order of the gene ratios of the individual genes. With the Kaplan-Meier log rank test, EGFR, FGFR1, GATA6, IGF2, and SETDB1 showed lower survival in subset 10 than in the other subsets (Supplementary Fig. 1 & 2). When the patients were divided into subset 10 and the other subsets, EGFR, FGFR1, GATA6, IGF2, and SETDB1 exhibited significant differences in survival time between subset 10 and the other subsets in the Kaplan-Meier log rank test ( Supplementary Fig. 3 & 4). The clinicopathological and molecular features that were found to be statistically significant in univariate survival analysis included tumor subsite, Lauren classification, T stage, N stage, M stage, venous invasion, lymphatic embolus, and perineural invasion. When the individual genes were included in multivariate survival analysis with clinicopathological factors that were found to be significantly associated with survival, the EGFR and IGF2 gene ratios were independent prognostic parameters associated with poor prognosis in terms of both OS and RFS ( Table 1). The GATA6 gene ratio was found to be a significant risk factor in the multivariate analysis of OS only, and the SETDB1 gene ratio was found to be a significant risk factor in the multivariate analysis of RFS only.
To evaluate the additive effect of copy number gains in four genes (EGFR, GATA6, IGF2, and SETDB1) on prognostication power, a tumor was scored "1" or "0" when the specific gene ratio belonged to subset 10 or the other subsets, respectively. The sum of scores for the four genes ranged from 0 to 4 in each tumor. Although the survival curves of the sum scores were significantly different for OS and RFS (Kaplan-Meier log rank test), the survival curves of sum scores 1 and 2 were similar, and those of sum scores 3 and 4 were similar (Fig. 1a,b). Thus, the GC patients were classified into 3 subsets, including a subset with sum score 0, a subset with sum score 1 or 2, and a subset with sum score 3 or 4 (Fig. 1c,d). The sum scores of 3 and 4 were also independent prognostic factors of OS (HR = 3.805, 95% CI = 2.014-7.188, P < 0.001) and RFS (HR = 3.709, 95% CI = 1.953-7.042, P < 0.001) in GC patients regardless of tumor subsite, Lauren histology, venous invasion, lymphatic invasion, perineural invasion, and T, N, and M categories ( Table 2).
To identify whether CNV determined by ddPCR was correlated with expression levels of mRNA in four genes, we measured mRNA expression levels of four genes in 14 gastric cancer cell lines, using RT-qPCR, which were analyzed for their CNV in four genes using ddPCR. Four genes showed significant correlations between RT-qPCR and ddPCR values (Fig. 2).  Table 3 summarizes the relationships between the sum scores and clinicopathological features. The sum score was higher in tumors with venous invasion than in tumors without venous invasion. The sum score tended to be higher in tumors with N3b than in tumors without nodal metastasis. However, no differences in the sum score were found in association with age, sex, tumor subsite, Lauren histology, lymphatic emboli, perineural invasion, tumor depth, distant metastasis, or molecular subtype. When TIL density was compared among GCs with different sum scores, CD3 TIL and CD8 TIL densities were highest in tumors with sum scores of 0 and lowest in tumors with sum scores of 3-4 ( Fig. 3a,b). When the methylation levels of repetitive DNA elements, including L1 and SAT-alpha, were compared among the three subsets, the L1 or SAT-alpha methylation level was higher in the subset with a sum score of 0 than in the subsets with a sum score of 1-2 or a sum score of 3-4 (Fig. 3c,d). However, because EBV and MSI subtypes were not classified into the copy number gain type, copy number gain status needs to be analyzed for correlation with TIL densities and L1 or SAT-alpha methylation level in non-MSI/non-EBV subtype. Not only CD3 TIL and CD8 TIL densities but also L1 or SAT-alpha methylation levels were highest in GCs with a sum score of 0 and lowest in GCs with a sum score of 3-4 (Supplementary Tables 5).

Discussion
In the present study, we analyzed the gene ratios of 7 genes, including MYC, EGFR, ERBB2, FGFR1, GATA6, IGF2, and SETDB1, in advanced GC patients using ddPCR. To determine the cut-off value of the gene ratios with prognostic utility, we partitioned the GC patients into 10 subsets according to the gene ratios and then performed survival analysis, which revealed that subset 10 with the highest gene ratios for EGFR, FGFR1, GATA6, IGF2, and SETDB1 was associated with worse clinical outcomes in patients with GC. Of these five genes, FGFR1 was not found to be an independent prognostic parameter in multivariate analysis. To assess the additive effect of copy number gains in the four genes (EGFR, GATA6, IGF2, and SETDB1), we calculated the sum score; in other words, we counted, in each case, the number of genes for which the gene ratio belonged to subset 10. According to survival curves, the GC cases could be grouped into GCs with a sum score of 0, a sum score of 1 or 2, and a sum score of 3 or 4. GCs with sum scores of 3 or 4 were found to be associated with worse survival in GC patients (OS, hazard ratio of 3.320, 95% CI = 1.756-6.278, P < 0.001; RFS, hazard ratio of 3.285, 95% CI = 1.736-6.217, P < 0.001) in the multivariate analysis, regardless of tumor subsite, Lauren histology, venous invasion, lymphatic invasion, perineural invasion, and T, N, and M categories. Our study demonstrated that the sum score was inversely associated with the CD3 or CD8 TIL density, which indicates that the copy number gain of multiple protooncogenes is associated with decreased infiltration of CD3 or CD8 TIL density. Our finding is in line with findings of recent studies in which the amplification of MYC, NOTCH2, and FGFR1 was inversely associated with the expression of genes related to cytotoxic T cell function in pancreatic ductal adenocarcinoma 7,8 . Not only the amplification but also the SNV mutations of protooncogenes have been demonstrated to be associated with decreased cytotoxic T cell infiltration in tumor areas. For lung cancers, EGFR mutations have been linked with decreased cytotoxic T cell infiltration 9,10 , whereas for colorectal cancers, KRAS mutations have been associated with increased marrow-derived suppressor cell infiltration and the subsequent decreased infiltration of cytotoxic T cells 11,12 . Based on the association between the copy number gain of multiple protooncogenes and the decreased infiltration of CD3 and CD8 TILs, it might be questioned   www.nature.com/scientificreports/ whether the prognostic value of the sum score is bestowed by the decreased density of TILs. However, in the multivariate analysis, both the sum score and CD8 TILs were found to be independent prognostic parameters for both OS and RFS (Supplementary Tables 6 & 7). When we correlated the sum scores with clinicopathological features, GCs with high sum scores showed an association with venous invasion but did not show associations with lymphatic emboli and nodal stage. At present, the reason why GCs with high sum scores are more likely to invade veins rather than lymphatic vessels is unclear. Whether GC cells intravasate into either blood or lymphatic vessels might be related to several factors, including the physical differences between lymphatic and blood vessels, the more favorable conditions for tumor cell survival in lymphatic vessels because of the low-shear system of fluid transport 13 , and the active molecular mechanisms attracting malignant cells more towards blood or lymphatic vessels 13,14 . In the present study, when we correlated the copy number gain of the four individual genes with venous invasion, we found that the SETDB1 gene ratio was significantly higher in GCs with venous invasion than in GCs with no venous invasion (Supplementary Table 8). The SETDB1 (KMT1E) gene encodes a histone methyltransferase that methylates Lys-9 of histone H3 up to trimethylation. The SETDB1 gene is located on chromosome 1q21, which shows copy number gains in several tissue types of human cancers, including breast cancer 15 , melanoma 16 , lung cancer 17,18 , and liver cancer 19 . An oncogenic role of SETDB1 has been demonstrated in lung cancer and prostate cancer, in which SETDB1 is involved in the positive stimulation of WNT signaling 20,21 . The downregulation of the SETDB1 gene has been found to decrease the migration and invasion of prostate cancer cells and inhibit the growth of prostate cancer cells by inducing G0/G1 cell cycle arrest 22 . Significant relationships between higher SETDB1 protein expression and shorter survival times have been demonstrated in patients with lung cancer 17,23 , liver cancer 19 , and colon cancer 24 . Although the copy number gain in GC can be referred to in the COSMIC and TCGA databases, little information is available in the literature regarding relationships between the higher expression of SETDB1 protein or the copy-number gain of SETDB1 and the clinicopathological features of GC.
Tumoral L1 hypomethylation and SAT-alpha hypomethylation have been shown to be associated with shortened survival in patients with advanced GC 4 . Tumoral L1 and SAT-alpha hypomethylation occurs in the background of diffuse genomic hypomethylation, which is closely associated with chromosomal instability. Thus, the copy number gain of multiple genes is expected to occur in GCs with L1 hypomethylation or SAT-alpha hypomethylation. In a previous study, we determined L1 and SAT-alpha methylation statuses using pyrosequencing methylation assays, so we used the previous data of L1 and SAT-alpha methylation levels and compared L1 www.nature.com/scientificreports/ and SAT-alpha methylation levels among different sum scores, which revealed a significant difference between GCs with sum scores of 0 and GCs with sum scores of 1-2 or sum scores of 3-4 (Fig. 3). To identify whether the prognostic significance of the sum score could be affected by L1 and SAT-alpha methylation statuses, we performed multivariate analysis with the inclusion of L1 and SAT-alpha methylation statuses and other prognostic variables that were found to be statistically significant in the univariate survival analysis (Supplementary Table 9). The sum score was found to be an independent prognostic parameter for both OS and RFS.
There are a few limitation to the current study. One of which is a lack of validation set. As an attempt to overcome this shortcoming, we applied our scoring system on identical genes of TCGA STAD. However, only a small fraction of samples scored higher than 3 (4/438, 0.9%) in TCGA STAD compared with our study cohort (16/319, 5.0%) and such as small sample size will not suffice as proper validation. Therefore, an independent, external validation set would be essential in future studies. In conclusion, copy number gains in three or four of the EGFR, GATA6, IGF2, and SETDB1 genes were found to be associated with venous invasion, decreased TIL densities, decreased levels of DNA methylation in L1 or SAT-alpha, and shortened rates of both OS and RFS. A high sum score was found to be an independent prognostic parameter associated with poor prognosis in patients with advanced GC. An independent study is needed to validate the prognostic value of high sum scores in the four genes. License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.