Integrated DNA and RNA sequencing reveals early drivers involved in metastasis of gastric cancer

Gastric cancer (GC) is the second cause of cancer-related death and metastasis is an important cause of death. Considering difficulties in searching for metastatic driver mutations, we tried a novel strategy here. We conducted an integrative genomic analysis on GC and identified early drivers lead to metastasis. Whole-exome sequencing (WES), transcriptomes sequencing and targeted-exome sequencing (TES) were performed on tumors and matched normal tissues from 432 Chinese GC patients, especially the comparative analysis between higher metastatic-potential (HMP) group with T1 stage and lymph-node metastasis, and lower metastatic-potential (LMP) group without lymph-nodes or distant metastasis. HMP group presented higher mutation load and heterogeneity, enrichment in immunosuppressive signaling, more immune cell infiltration than LMP group. An integrated mRNA-lncRNA signature based on differentially expressed genes was constructed and its prognostic value was better than traditional TNM stage. We identified 176 candidate prometastatic mutations by WES and selected 8 genes for following TES. Mutated TP53 and MADCAM1 were significantly associated with poor metastasis-free survival. We further demonstrated that mutated MADCAM1 could not only directly promote cancer cells migration, but also could trigger tumor metastasis by establishing immunosuppressive microenvironment, including promoting PD-L1-mediated immune escape and reprogramming tumor-associated macrophages by regulating CCL2 through Akt/mTOR axis. In conclusion, GCs with different metastatic-potential are distinguishable at the genetic level and we revealed a number of potential metastatic driver mutations. Driver mutations in early-onset metastatic GC could promote metastasis by establishing an immunosuppressive microenvironment. This study provided possibility for future target therapy of GC.


INTRODUCTION
Gastric cancer (GC) is the second causes of cancer deaths around the world. In China, it was estimated that 679,100 GC patients were diagnosed each year and most of them were diagnosed at an advanced stage [1]. It is universally accepted that mutations increase in metastasis process, but if tumors gain metastatic ability through accumulation of mutations remains unclear [2]. Some large-scale sequencing did not detect metastatic driver mutations [3][4][5]. With the development of next-generation sequencing, recent articles discovered some genetic drivers that could partially explain the early development of metastasis [6][7][8][9][10].
The metastatic molecular profile of GC is still unclear. Considering early prometastastic drivers established in primary tumor and passenger mutations accumulating in the metastasis process, traditional comparative analysis between primary and metastasis tumors might lead to misidentification of driver mutations [11][12][13]. It is necessary to use more appropriate samples and methods to identify metastasis-associated genomic aberrations. We conceived that comparative analysis between primary tumors with different metastatic-potential will help to identify early drivers in primary lesions.
Tumor progression consist of metastasis to regional lymph nodes and dissemination to distant organs. Recently studies confirmed that cancer cells in lymph node involved in dissemination and seeding distant sites [14,15]. It indicated that patients with lymph node involvement have a tendency to metastasize distantly. Commonly, patients with early GC (EGC) limited to gastric mucosa or submucosa (T1) are less prone to lymph-node metastases [16]. The rate of lymph-node metastasis in EGC was reported from 5.7 to 14% [17][18][19][20], and the 10-year survival rate of EGC patients with or without lymph-node metastasis is around 72 or 92%, separately [21]. So we conceived that EGC with lymph-node metastasis (T 1 N + M any ) had higher metastatic-potential (HMP), since metastatic cancer cells occur early in primary tumors instead of accumulating along with the development of metastasis, and be suitable as a model for screening of early prometastastic drivers. Here, we presented the molecular landscape of 432 GC patients characterized by integrated DNA and RNA sequencing, especially the comparative analysis results between HMP group with EGC and lymph-node metastasis (T 1 N + M any ) and lower metastatic-potential (LMP) group without lymph-nodes or distant metastasis (T any N 0 M 0 ) (Fig. 1A). We aim to reveal special genomic aberrations according to different metastatic-potential and found early prometastastic drivers that lead to metastasis.

MATERIALS AND METHODS Clinical samples
Primary GC tissues and matched adjacent nontumor tissues were collected from GC patients who received gastrectomy in Fudan University Shanghai Cancer Center. All patients had pathologically confirmed gastric adenocarcinoma. The clinical samples from these GC patients were kept in RNAlater at −80°C immediately after the surgical resection. The experiments involving clinical samples from patients was approved by the Ethics Committee of Fudan University Shanghai Cancer Center and informed consent was obtained from all patients. The experiments involving animals were approved by the Ethics Committee of Fudan University Shanghai Cancer Center and the Ethics Committee of Shanghai University of Traditional Chinese Medicine.

Whole-exome sequencing and targeted gene sequencing
The whole-exome sequencing (WES) and targeted gene sequencing (TES) were done and analyzed as described in previous studies of our team [22,23]. TES was performed using eight selected genes, including five genes (ALK, MADCAM1, PREX2, TAF1L and TP53) with significantly higher mutation rate in HMP group than LMP group by Chi-squared test. Considering that the sample size is small and some meaningful mutations, especially low frequency mutations, could be excluded by Chi-squared test, we also included 3 genes (COL12A1, LAMA1, PRG4) with at least 1.5 times higher mutation rate in HMP group than LMP group but no statistical significance.
Genomic DNA from the frozen tissues and peripheral blood was extracted by the DNeasy Blood & Tissue Kit (Qiagen). Genomic DNA from formalin-fixed paraffin embedded (FFPE) tissues was extracted by the GeneRead DNA FFPE Kit (Qiagen). Genomic DNA libraries were prepared using the standardized protocols recommended by Illumina. Whole-exome enrichment was performed using the SeqCap EZ capture kit (Roche). Targeted gene enrichment including eight genes (TP53, COL12A1, ALK, MADCAM1, PRG4, LAMA1, PREX2 and TAF1L) was performed with the xGen Hybridization and Wash Kits (IDT). Briefly, 1 μg of DNA was sheared into short fragments (200~300 bp) using Covaris S220. The DNA fragments were then end-repaired to generate adenylated 3' ends. Adapters with barcode sequences were then ligated to both ends of the fragments, and E-Gels were used to select DNA fragments of the targeted size. Next, ten PCR cycles were performed, and the resulting mixture was purified. After the Illumina sequencing libraries were amplified with ten PCR cycles, capture probes were added, and the mixtures were incubated for 24 h at 65°C. The hybridized mixtures were then amplified with an additional ten PCR cycles. Captured DNA libraries were sequenced with the Illumina HiSeq 2500 Genome Analyzer, yielding 200 (2 × 100) base pairs from the final library fragments.
Sequencing reads were trimmed and filtered with Trimmomatic. The reads were aligned to the hg19 reference genome using Burrows-Wheeler Aligner, and the Genome Analysis Toolkit was used for base quality score recalibration, indel realignment and duplicate removal. VARSCAN software was used to identify somatic single-nucleotide variations and indels in WES data. VARDICT software was used to identify somatic single-nucleotide variations and indels in targeted gene sequencing data. MutsigCV software was used to analyze the significance of identified mutations with default covariate tables.

RNA sequencing
Firstly, we extracted total RNA from GC tissues and matched normal tissues by Trizol® Reagent (Invitrogen, USA). In total, 1 μg RNA was treated with Ribo-off rRNA Depletion Kit (Vazyme) before constructing the RNA-seq libraries. VAHTS Total RNA-seq (H/M/R) Library Prep Kit for Illumina (Vazyme) were utilized to prepare RNA-seq libraries following the manufacturer's instructions. Furthermore, ribosome depleted RNA samples (~100 ng) were fragmented and then used for firstand second-strand cDNA synthesis with random hexamer primers. The ends of the cDNA fragments were repaired by DNA End Repair Kit. Then the cDNA fragments were modified with Klenow to add an A at the 3' end of the DNA fragments, and finally ligated to adapters. We subjected purified dsDNA to 12 cycles of PCR amplification, and sequenced the libraries by Illumina sequencing platform on a 150 bp paired-end run. Sequencing reads from RNA-seq data were aligned using the spliced read aligner HISAT2, which was supplied with the Ensembl human genome assembly (Genome Reference Consortium GRCh38) as the reference genome. Then "combatseq" R package was used to remove batch effects. Gene expression levels were calculated by the TPM (Transcripts Per Million). We utilized the GENCODE (v25) database to get annotations of mRNA in the human genome.

Genomic analysis
We utilized Database for Annotation, Visualization, and Integrated Discovery to perform Gene ontology (GO) analysis. The MATH score is calculated as the percentage ratio of median absolute deviation (MAD) and the median of its mutant-allele fractions at tumor-specific mutated loci, which is based on WES data of tumor and matched normal DNA: MATH = 100 × MAD/median, as described in our previous study [24]. Gene Set Enrichment Analysis (GSEA) was performed to present the gene set enriching using gene sets from Molecular Signatures Database. The scores of 64 different cell types were estimated by xCell algorithm. The relative cellular fraction of 22 immune infiltration was calculated by CIBERSORT.

Construction of metastasis-related signature
In total, 131 genes that were not only differentially expressed between HMP tumors and matched normal tissues (p < 0.05), but also between M1 tumor and HMP (as well as between M1 and LMP) tumors (p < 0.05) were included as candidates. The prognostic signature was developed by LASSO algorithm with 200 bootstrap replications. Eleven mRNAs and long noncoding RNAs were selected for following signature. Patients in the TCGA cohort were divided into low and high riskscore groups according to the cut-off risk score (−0.885) identified by the receiver operating characteristic analysis.

Transfection
The siRNAs and the negative control were obtained from RiboBio (Guangzhou, China). Lipofectamine 2000 (Invitrogen, USA) was used to conduct transfection siRNAs and plasmids following the manufacturer's protocol. Lipofectamine 2000 reagent and the siRNAs or plasmids were diluted in Opti-MEM medium separately. Lipofectamine 2000 reagent was incubated in room temperature for 5 min, and then mixed with diluted transfection reagent and siRNAs or plasmids. After incubating in room temperature for 20 min, the complex was added to GC cells. The media was changed 6-8 h after transfection. The transfected cells were collected for further assays after 48 h.

Cell migration assays
The migration ability of cells was investigated by transwell assays as described previously [25,26]. Cells were resuspended in serum free media and seeded into the chambers (Falcon, USA), while the lower chambers contained 600 μl DMEM with 20% FBS. After 24 h incubated, cells were fixed on the chambers in paraformaldehyde and stained with crystal violet. Then the upper surface was wiped off with cotton swabs and cells on the lower surface were counted under a microscope.

RNA extraction and quantitative real-time RT-PCR
RNA extraction and quantitative real-time RT-PCR were performed as described previously [25,26]. Trizol® Reagent (Invitrogen, USA) was applied to extract total RNA of cells and cDNA was synthesized by using the PrimeScript RT Kit (Takara, China). The mRNA levels were measured by quantitative real-time PCR with SYBR® Premix Ex Taq™ (Takara, China). The primers were synthesis by Sangon (Shanghai, China). Each sample had three repetitions.

Co-culture assay
Cancer cells (4 × 10 5 /ml) were seeded in the upper chamber while M0 macrophages induced by THP-1 using PMA (50 ng/μl, Peprotech, USA) were seeded in the lower chamber of the 6-well transwell chambers with 0.4 um pore size polyester membrane (Corning, USA). After 48 h incubation, cancer cells or TAMs (reprogrammed-TAMs) were harvested for following assays.

Chemotactic migration assays of T cells and macrophages
For the THP-1 and T cell chemotaxis assay, 1 × 10 5 cells resuspended in 200 ul conditioned medium were added into the upper chamber of 5 μm pore transwell inserts (Corning, USA), while the lower chamber contained 600 ul supernatant of cancer cells or TAMs. Cells in the lower chamber were harvested and counted after 24 h incubation.
For the macrophage chemotaxis assay, THP-1 cells treated with PMA were seeded in the upper chamber with the supernatant of cancer cells or TAMs added to the lower chamber, using the 8 μm pore transwell inserts (Corning, USA). In the CCL2 neutralizing assays, the supernatant of treated cancer cells with or without CCL2 neutralizing antibody (R&D Systems, #23007, 1 ug/ml) was added into the lower chamber. Cells gone through the membrane after 48 h were fixed in 4% paraformaldehyde and stained with 0.1% crystal violet, and then quantified using five random fields.

Human protein chemokine array
Treatepid MGC-803 (5 × 10 5 /ml) were seeded in 6-well plates for more than 48 h. Collect the supernatants of the cell cultures and complete the assay using the RayBio Chemokine Antibody Array I (RayBiotech, USA) according to the instruction manual. We firstly block the membranes at room temperature for 30 min and then incubate them in 1 ml supernatants overnight at 4°C. After washing the membranes with the washing buffer, we sequentially incubated the membranes in 1 ml primary biotinconjugated antibodies and 2 ml of 1000-fold diluted horseradish peroxidase-conjugated streptavidin both at room temperature for 2 h. Signals were detected using the ImageQuant LAS 4000 (GE Healthcare) and quantified by ImageJ software. Normalize the results using the positive control contained in the different membranes.

T-cell amplification and tumor-reactive T cell killing assay
Peripheral blood mononuclear cells (PBMCs) purchased from AllCells (China) were activated with anti-CD3 antibody (100 ng/ml, Peprotech, USA), anti-CD28 antibody (100 ng/ml, Peprotech, USA) and IL-2 (10 ng/ml, Peprotech, USA). GC cell were seeded in 96-well plates and co-cultured with activated PBMCs at the ratio of 8:1 for 24 h. To detect if MADCAM1 MUT -reprogrammed-TAM could suppress tumor-reactive T cell killing, we seeded the MGC-803 cells together with reprogrammed-TAMs (1:1) in 96-well plates. After these cells were attached, we distributed activated T-cells (MGC-803: T-cells = 1:8) into the plates and co-cultured for 24 h. The results were tested using the CCK8 kit (Dojindo, Japan) and the OD value of each well was read by BioTEK Micro-Volume Spectrophotometer (Epoch).

PD-1 antibody treatment assay
Treated MGC-803 were seeded into 96-well plates (2 × 10 4 cells/well), and cocultured with activated PBMCs with or without PD-1 antibody (10 ug/ml, Innoventbio) for 24 h. To find whether reprogrammed-TAM could suppress tumor-reactive T cell killing when PD-1 was blockaded, reprogrammed-TAMs and MGC-803 cells were together seeded into the 96-well plates at the ratio of 1:5, and cell survival rates were evaluated after being co-cultured with activated PBMCs (cc:PBMCs = 1:4) and PD-1 antibody for 48 h. The results were tested using the CCK8 kit (Dojindo, Japan) and the OD value of each well was read by BioTEK Micro-Volume Spectrophotometer (Epoch).

Subcutaneous tumor model
TAMs were reprogrammed by coculturing treated MGC-803 with M0 macrophages induced by THP-1 using PMA as above mentioned. In total, 5 × 10 6 reprogrammed TAMs were mixed with 5 × 10 6 MGC-803 cells, and the mixture was injected subcutaneously into the female NOD/SCID mice (6-8 weeks old, 20-25 g) were, 5 mice per group. We sacrificed these mice after 2 months and examined the lungs for metastatic loci by HE staining under microscope. Five random slices from each mouse were used for metastasis area calculation by ImageJ software.

Tail vein metastasis model
In total, 2 × 10 6 treated MGC-803 were injected into the tail vein of the female nude mice (6-8 weeks old, 20-25 g), five mice per group. After 2 months, we sacrificed all mice and examined the lungs for metastatic loci by HE staining under microscope. Five random slices from each mouse were used for metastasis area calculation by ImageJ software.

Statistical analysis
No statistical methods were used to determine the sample size. No randomization was used and the investigators were not blinded to allocation during experiments. Statistical analysis was performed using Pearson chi-square test, Mann-Whitney U test, Log-rank test, Kruskal-Wallis test, pairwise t-test, Student' t test and Pearson correlation test, using R version 3.5.1 and GraphPad Prism Software 7.0. The p value < 0.05 was set as the statistical significance.

Overview of WES
We performed WES on tumor and paired normal DNA from 192 Chinese GC patients (Fig. 1B). Demographics and clinicopathologic characteristics of our cohort was shown in the Supplementary Table 1. In total, 38,641 non-synonymous somatic mutations in 13,191 genes were identified (Fig. 1C). Mutational signatures analysis based on 96 possible somatic substitutions showed that a signature associated with defective DNA mismatch repair is one of the dominant signatures (Fig. 1D). The median number of mutations was 40 per sample.

Comparison of cancer driver mutations among populations
To compare the mutational pattern of FUSCC cohort in this study with other populations, we obtained the TCGA GC cohort (n = 395). The median number of mutations in TCGA cohort was 107/ sample, which is significantly higher than FUSCC cohort. We include 30 genes described as "cancer driver mutations" in previous literatures and almost all of them had higher mutation rate in TCGA cohort (28/30) (Supplementary Table 2). This suggests that Chinese GC patients show different characteristics from TCGA cohort and may have different drivers, which were worth exploring.
The relationship between mutation of DNA damage and repair (DDR) related genes and mutational load We next examined the relationship between mutation status of 35 key DDR related genes and the number of mutations (Supplementary Table 3). Tumors with mutation of any MMR genes have obviously higher mutational load than others (Median: 953.5 vs. 36.5). To reduce the influence of MMR genes status on mutational load, the number of mutations according to DDR genes were calculated in tumors with wild-type MMR genes. We found that mutation status of these DDR genes was all associated with higher mutational load.
Mutational signatures and intra-tumor heterogeneity (ITH) according to metastatic-potential To explore the genomic aberrations according to metastaticpotential, we separated samples into groups based on metastaticpotential. Besides HMP (N = 37) and LMP group (N = 48), all the rest of samples (T 2-4 N + M 0 , N = 107) were classified as middle metastatic-potential (MMP) group. The most frequent transition in HMP group was C > T, and C > A was more common in LMP group ( Supplementary Fig. 1). Median number of mutations was significantly higher in HMP group, compared to LMP group (91 vs. 54) ( Fig. 2A, B). Intra-tumor genetic heterogeneity measured by mutant-allele tumor heterogeneity (MATH) was higher in HMP group (mean value = 49.08) than LMP group (mean value = 44, p < 0.001) (Fig. 2C).
Identification of significant mutations according to metastatic-potential We next identified potential prometastatic mutational genes. The top mutated genes in two groups were shown in Fig. 2D, E. Mutational genes showed enrichments for known oncogenic pathways, while NOTCH and RTK-RAS pathways are tops in HMP and LMP groups respectively (Fig. 2F, G). Interestingly, we analysis some known DDR related genes and found that the mutational rate of most (24/31) were higher in HMP group (Fig. 2H).
We identified 176 genes with significantly higher mutation rate in HMP group (>10%) than LMP group (p < 0.05) (Supplementary Table 4). All these genes had lower mutational rates in MMP group than HMP group, which illustrates the reliability of our grouping method. GO analysis annotated these mutations and the topranking biological process included "Cell Adhesion" (p = 0.008) (Fig. 2I).

Further validation of candidate metastasis-related genes in vitro
We then explore the roles of candidate metastasis-related genes in metastasis by in vitro experiments. We found that knockdown of RB1CC1, DLL1, PRG4, PCLO, and NBPF10 increased, while knockdown of TAF1L, PREX2 and ATXN3 and upregulation of TP53 and MADCAM1 and KRTAP5-5 impaired the migration ability of GC cells, which suggested their potential roles in metastasis of GC ( Supplementary Fig. 2).
Candidate prometastatic mutations associated with distant metastasis by TES We further performed TES on another cohort of 153 GC patients using eight candidate genes with higher mutation rate in HMP group ( Fig. 3A and Supplementary Table 5). All of these eight genes mutated more frequently in patients with distant metastasis at diagnosis, especially than those did not have metastasis within 3 years after surgery. In patients with metastasis within 3 years after surgery, the mutational rates of TP53, MADCAM1, PREX2, PRG4 and TAF1L were nearly twice or more than patients without metastasis and statistical significances were observed in TP53 and MADCAM1 (Fig. 3B). Though with low mutational rate, MADCAM1 mutations were only in patients with metastasis within 3 years after surgery (5.88%) and at diagnosis (9.52%). Consistent with their top ranking in MutsigCV analysis, mutation status of TP53 and MADCAM1 were significantly associated with worse metastasis-free survival (MFS) (Fig.  3C-F). Mutation status of TP53 and MADCAM1 were still independent risk factors for MFS after adjusting possible confounding factors (Supplementary Table 6). Mutation hotspots from WES and TES samples at the protein encoded by MADCAM1 were identified in "Repeat Expansion" region between amino acids 227 and 271 (Fig.  3G). TP53 and MADCAM1 hotspot mutations could enhance migration abilities of GC cells (Fig. 3H, I and Supplementary Fig. 2C,  D). The roles of MADCAM1 and its mutants in proliferation ability of GC cells were confirmed by in vitro assays. (Supplementary Fig. 2E, F). We further observed increased lung metastasis generated by MADCAM1 P270Q or MADCAM1 D242N injected into the tail veins of nude mice (Fig. 3J). The expression of some EMT markers (Slug, Snail, E-cadherin), stem-like cell markers (Oct4, Cd44) were up-regulated in AGS transfected with vectors expressing MADCAM1 P270Q or MAD-CAM1 D242N , compared with MADCAM1 WT (Fig. 3K, L).
Transcriptional landscape and tumor molecular heterogeneity according to metastatic-potential We then performed analysis on transcriptomes of tumor and paired normal tissues from 87 GC patients, including 29 HMP patients, 32 LMP patients and 26 patients with distant metastasis at diagnosis (M1). Hierarchical clustering method could distinguish tumors from normal tissues and also HMP from LMP tumors (Fig. 4A). GSEA demonstrated that HMP group was enriched with pathways including mTORC1 and PI3K/AKT/mTOR signaling (Fig. 4B). The molecular heterogeneity was then demonstrated by the number of differentially expressed genes (DEGs) under certain cut-off p value between tumor and matched normal tissues [27]. As expected, the number of DEGs in HMP group was more than LMP group under different p value or adjusted p value, indicating higher heterogeneity in HMP group, which is consistent with above results of ITH calculated by MATH (Supplementary Table 7).
Construction of metastasis-related signature to predict prognosis In total, 131 genes with high expression in HMP and M1 tumors were included as candidates for prognostic signature by LASSO algorithm in TCGA cohort. A signature consisting of 11 mRNAs and long non-coding RNAs were derived and the integrated mRNA-lncRNA signature had better prognostic value (area under curve (AUC) = 0.744) than traditional TNM stage (AUC = 0.549) (Fig. 4C). Patients with high riskscore had significantly poor survival (Fig. 4D). We also find that the survival of patients with same TNM stage could be distinguished by the riskscore (Supplementary Fig. 3A). In addition, patients with stage III-IV and low riskcore, have better survival than patients with stage I-II and high riskcore, which also demonstrated the superior of the signature than traditional TNM stage (Fig. 4E). In FUSCC cohort, we observed more proportion of M1 (50.00% vs. 9.30%) and less LMP (20.45% vs. 53.49%) in high riskcore than low riskcore group (Supplementary Fig. 3B).

Enhanced suppressive immune phenotype in HMP patients
We analyzed the level of immune cell infiltration by 66 markers of different immune cell types according to GSVA scores  ( Supplementary Fig. 3C) [28]. The tumors of HMP group exhibited remarkably enhanced immune cell infiltration than normal tissues (p = 0.0193), but there was no significant difference in LMP group.
We further investigated the scores of 64 different cell types by xCell algorithm (Fig. 4F). In tumors of HMP group, we observed elevated endothelial cells and M2 macrophage than normal tissues of HMP group or tumor tissues of LMP group (Fig. 4G, H).

MADCAM1 mutants associated with chemotactic migration of T cells and macrophages
Considering the roles of MADCAM1 mutants in metastasis, we next examined if MADCAM1 was involved in immune infiltration and immune escape. The expression of MADCAM1 show a significant positive correlation with lymphocyte and a negative correlation with macrophages calculated by CIBERSORT, in both TCGA and FUSCC cohort ( Fig. 5A and Supplementary Fig. 4A, B). The tumors carrying mutational MADCAM1 had more macrophage infiltration than others in TCGA cohort (Supplementary Fig. 4C). We also observed that the expression of MADCAM1 significantly associated with many chemokines, chemokine receptors and immunomodulators, including CD274 that coding PD-L1 (Supplementary Table 8). IHC assays showed that M2 macrophage marker CD163 in infiltrating immune cells and MADCAM1 in the tumor cells was significantly positively stained in GC samples carrying mutated MADCAM1 (Fig. 5B). So, we investigated whether mutated MADCAM1 lead to alteration in stability of Madcam1 protein. We examined the level of Madcam1 protein in the presence of 100 μg/ml cycloheximide, a protein-synthesis inhibitor. The half-life time of Madcam1 protein for MADCAM1 WT cells was~1.5 h, whereas over 4 h in MADCAM1 P270Q and MAD-CAM1 D242N cells (Fig. 5C).
We next explored if MADCAM1 mutants affects chemotactic migration of T cells and M2 macrophages. The supernatants from MADCAM1 WT GC cells induced migration of activated T cells, and reduced migration of M2 macrophages. Meanwhile, the supernatants from MADCAM1 P270Q and MADCAM1 D242N GC cells reduced migration of T cells, and induced migration of M2 macrophages (Fig. 5D, E). MADCAM1 mutants recruited macrophages by regulating CCL2 through Akt/mTOR axis Chemokine profiles detected by Human Chemokine Antibody Arrays presented higher CCL2 and IL8 in supernatants of MADCAM1 D242N GC cells compared with that of MADCAM1 WT GC cells (Fig. 5F, G). Further, blocking CCL2 reduced migration of macrophages, especially in MADCAM1 D242N GC cells (Fig. 5H), which demonstrated that the ability of MADCAM1 D242N in recruiting macrophages was partly dependent on CCL2. In western blot assay (Fig. 3K), phosphorylation levels of Akt were obviously enhanced in MADCAM1 P270Q and MADCAM1 D242N GC cells, while the total levels were hardly changed. The expression of p-mTOR and CCL2 was suppressed after inhibiting Akt in MADCAM1 P270Q and MADCAM1 D242N GC cells (Fig. 5I, J). These results indicated that MADCAM1 mutants promotes recruitment of macrophages by regulating CCL2 through Akt/mTOR axis.
MADCAM1 mutations promoted PD-L1-mediated immune escape T-cell-mediated killing assays were performed to determine if MADCAM1 mutations affects T cell cytotoxic activity. MAD-CAM1 P270Q and MADCAM1 D242N GC cells presented significantly higher survival rate than MADCAM1 WT GC cells and control after co-culture with activated PBMC (Fig. 6A). Consistent with above correlation analysis, up-regulated CD274 (PD-L1) expression was confirmed in MADCAM1 MUT GC cells by qPCR (Fig. 6B) and Western blot assays (Fig. 3K). Further, blocking PD-1 by anti-PD-1 antibody could facilitate T-cell-mediated killing in MADCAM1 MUT GC cells (Fig. 6C). These results indicated that MADCAM1 P270Q and MADCAM1 D242N mutations could upregulated PD-L1 and suppress PD-L1-mediated T cell killing.
MADCAM1 MUT -reprogrammed-TAMs attracted more macrophage and promote migration of GC cells We next test if MADCAM1 MUT -reprogrammed-TAMs promote migration of macrophages and GC cells. Macrophage migration assays were performed by supernatants from reprogrammed-TAMs. MADCAM1 P270Q -reprogrammed and MADCAM1 D242N -reprogrammed-TAMs significantly promote migration of macrophages (Fig. 6F). The GC cells co-cultured with MADCAM1 MUT -reprogrammed-TAMs showed increased migration ability compared with MADCAM1 WT -reprogrammed-TAMs, which could be reversed by Akt inhibitor (Fig. 6G and Supplementary Fig. 5E). We next validated these results in vivo. MADCAM1 MUT -reprogrammed or MADCAM1 WT -reprogrammed-TAMs were mixed with MGC-803 cells and co-injected subcutaneously into NOD/SCID mice and MADCAM1 D242N -reprogrammed TAM group have more lung metastatic loci (Fig. 6H).

DISCUSSION
Identification of driver molecular alterations is critical for cancer research and development of new target drugs and treatment strategies. Traditional comparative analysis between primary and metastasis tumors or among metastases could miss out prometastatic drivers, since it is hard to distinguish them from protumorigenic or passenger mutations. Here, we tried a novel strategy to identify early drivers in primary gastric tumors. Integrated DNA and RNA sequencing and comparative analysis according to different metastatic-potential revealed some prometastatic drivers and immune suppressive microenvironment established in EGC. It provides evidence for early metastatic dissemination in GC, which have already been quantitatively validated in CRC by Hu et al. [29]. WES demonstrated distinct genomic pattern and higher mutation load in HMP groups. We revealed higher mutational rate of MMR and some DDR genes, which might illustrate this finding, since mutations in DDR gene could drive genomic instability and then generate more somatic mutations, including prometastatic drivers [30,31]. Higher mutation load and heterogeneity in HMP group indicated that multiple molecular alteration involved in metastasis.
Further, we identified 176 candidate prometastatic mutations and mutation of TP53, MADCAM1 and KRTAP5-5 enhanced migration of GC cells, which suggested the potential prometastatic roles of these candidate mutations. We then included 8 genes in further TES. Among them, mutation status of TP53 and MADCAM1 was independent risk factors for MFS. Furtherly, in vitro experiments indicated that TP53 and MADCAM1 mutants directly promoted migration of GC cells. The results indicated that our strategy to identify early prometastatic drivers was credible and challenged traditional opinions that there were few driver mutations involved in metastasis [3,4,32].
Some mutations identified in this study was previously found to involve in development and metastasis of cancer. TP53 mutant could gain additional functions to promote growth and metastatic-potential of tumor cells in many types of cancer [10,33,34]. Frequent PREX2 and PRG4 mutations was observed in melanoma and mutant PREX2 accelerated tumor formation [35]. As a member of the immunoglobulin family involving in lymphocyte adhesion, the roles of MADCAM1 mutants in tumor metastasis were firstly reported.
To investigate the association between metastatic-potential and gene expression on RNA level, we performed RNA sequencing. Higher heterogeneity measured by the number of DEGs was observed in HMP group, which was consistent with MATH algorithm. An integrated mRNA-lncRNA signature based on DEGs in HMP group was constructed and its prognostic value was better than traditional TNM stage. The combination of signature and stage could help guiding prognosis prediction and clinical treatment decision.
Upregulation of some EMT and stem-like cell markers not only confirmed that HMP group had HMP, but also indicated metastasis-related mutations might promote metastasis through EMT and stemness mechanisms. Mutated TP53 was previously reported to promote EMT and metastatic ability of cancer cells.
Here, we demonstrated that mutated MADCAM1 also involved in process of EMT and metastasis of GC cells. Our previous studies found that some stem-like cell markers (i.e., CD44) could enhancing EMT and metastatic ability of cancer cells [25,26,36,37].
Based on the finding that upregulation of immune cell infiltration in HMP group, we further investigated that if immune escape mechanism involved in early metastasis of GC. Tumor immune microenvironment was reported to play a critical role in cancer metastasis [32,38,39]. The infiltration of activated neutrophil, TAM, and Treg cells played roles in progression of dysplastic mucosa in early stage of gastrointestinal neoplasia [40][41][42]. We provided a comprehensive analysis of immune microenvironment in EGC for the first time and found immune suppressive microenvironment was presented in HMP group.
Does the immune suppressive microenvironment of HMP group result from mutations and finally contribute to cancer cells metastasis? Previous articles reported that some driver mutations participate in remodeling immune microenvironment, for example, TP53 mutant were reported to be involved in TAMs reprogramming to promote metastasis in colon cancer [43]. The hotspot mutational sites observed by us in TP53 and MADCAM1 indicated their potential gain-of-function (GOF) effects. We demonstrated that MADCAM1-mutated proteins act as GOF mutants and lead to alteration in stability of the Madcam1 protein. The roles of MADCAM1 mutants in tumor metastasis is a novel interesting funding. MADCAM1 was reported to be highly expressed in gastrointestinal mucosal endothelium and plays a role in leukocyte traffic into the mucosal immune compartment [44][45][46][47] and the roles of MADCAM1 mutants in tumor metastasis had never been reported before. Here, we found that MADCAM1 mutants could not only directly promote tumor metastasis, but also could trigger tumor metastasis by establishing complicated immunosuppressive microenvironment. On the one hand, MAD-CAM1 mutants reduced chemotactic migration of T cells by regulating chemokines and promotes PD-L1-mediated immune escape. On the other hand, MADCAM1 mutants induced chemotactic migration and M2-like polarization of macrophages by regulating CCL2 through Akt /mTOR axis. MADCAM1 MUTreprogrammed-TAMs attracted more macrophages and promoted migration of GC cells in vivo and in vitro. Additionally, MADCAM1 MUT -reprogrammed-TAMs could suppress tumorreactive T cell killing to help immune escape, which couldn't be reversed by anti-PD-1 antibody. It indicated that MADCAM1 MUTreprogrammed-TAMs and Akt/mTOR signaling pathway may contribute to resistance to anti-PD-1/PD-L1 therapy. Further in vivo work is warranted to verified the role of CCL2 and Akt/ mTOR axis in MADCAM1 MUT -reprogrammed immune Fig. 7 Proposed model of how early drivers (including MADCAM1) involved in tumor metastasis. Mutated MADCAM1 could drive tumor metastasis by directly empowering cancer cells to disseminate or establishing an immunosuppressive microenvironment. microenvironment in GC. Regardless of it, our results demonstrate the potential clinical value of further work exploring potential therapies that target MADCAM1 mutations in GC.
In conclusion, our results showed that the GCs with different metastatic-potential are distinguishable at the genetic level. A number of potential metastatic driver mutations was revealed in this study including known cancer-associated genes TP53. We also discovered some novel driver mutations, such as MADCAM1. The results proved our hypothesis that driver mutations in early-onset metastatic GC could drive metastasis not only by directly empowering cancer cells to disseminate but also by establishing an immunosuppressive microenvironment. Driver mutations and the remodeling of the immune microenvironment simultaneously promoted the metastasis and development of tumor. The summary diagram was shown in Fig. 7. Here we demonstrated that despite a low mutational rate in GC, GOF MADCAM1 mutations could play driver roles in promoting tumor metastasis through establishing immunosuppressive microenvironment, and MADCAM1 MUT -reprogrammed-TAMs and Akt/mTOR signaling pathway may contribute to resistance to anti-PD-1/PD-L1 therapy. Previous articles reported that TP53 mutations could reprogram TAMs to promote tumor progression and metastasis. These findings indicated that for GC patients carrying mutated MADCAM1 or TP53, combination of anti-PD-1/PD-L1 therapy and inhibition of TAMs would be a choice. The upregulation of checkpoint targets in HMP tumors also provides a theoretical basis for the early intervention of immune checkpoint inhibitors in EGC with high risk of metastasis in the future. This study increased the understanding of molecular landscape of GC patients and provided possibility for future target therapy.

DATA AVAILABILITY
The data of the current study available from the corresponding author on reasonable request.