Whole genome and transcriptome sequencing of matched primary and peritoneal metastatic gastric carcinoma

Gastric cancer is one of the most aggressive cancers and is the second leading cause of cancer death worldwide. Approximately 40% of global gastric cancer cases occur in China, with peritoneal metastasis being the prevalent form of recurrence and metastasis in advanced disease. Currently, there are limited clinical approaches for predicting and treatment of peritoneal metastasis, resulting in a 6-month average survival time. By comprehensive genome analysis will uncover the pathogenesis of peritoneal metastasis. Here we describe a comprehensive whole-genome and transcriptome sequencing analysis of one advanced gastric cancer case, including non-cancerous mucosa, primary cancer and matched peritoneal metastatic cancer. The peripheral blood is used as normal control. We identified 27 mutated genes, of which 19 genes are reported in COSMIC database (ZNF208, CRNN, ATXN3, DCTN1, RP1L1, PRB4, PRB1, MUC4, HS6ST3, MUC17, JAM2, ITGAD, IREB2, IQUB, CORO1B, CCDC121, AKAP2, ACAN and ACADL), and eight genes have not previously been described in gastric cancer (CCDC178, ARMC4, TUBB6, PLIN4, PKLR, PDZD2, DMBT1and DAB1).Additionally,GPX4 and MPND in 19q13.3-13.4 region, is characterized as a novel fusion-gene. This study disclosed novel biological markers and tumorigenic pathways that would predict gastric cancer occurring peritoneal metastasis.

influence of PM on survival, efforts should be undertaken to explore the possible molecular mechanisms and preventing or treating strategies.
Recently, next generation sequencing (NGS) has become very useful tools of comprehensive research on cancer, facilitating the identification of treatment targets and personalized treatments. NGS can systematically identify gene alterations for cancer and is a powerful approach for investigating the carcinogenesis and identifying novel therapeutic targets, as well as biomarkers [5][6][7] . Over the past few years, advances of NGS and affordable price have resulted in increased cancer genome studies, which are of great helpfulness to investigate pathogenesis, driver genes, molecular classification and drug targets for human gastric cancer 8 . For instance, Wang and Zang's groups separately revealed a number of potential cancer-driving genes of gastric cancer by whole-exome sequencing and identified recurrent somatic mutations in the chromatin remodeling gene ARID1A and alterations in the cell adhesion gene FAT4 9 . Wang′ s group also performed whole-genome sequencing in 100 tumor-normal pairs of gastric cancer and identified significantly mutated driver genes (MUC6, CTNNA2, GLI3, RNF43 and others) 10 . Cancer Genome Atlas Research Network published their genome research for 295 primary gastric adenocarcinomas and proposed a molecular classification dividing gastric cancer into four subtypes. That is tumors positive for Epstein-Barr virus, microsatellite unstable tumors, genomically stable tumors and chromosomal instability tumors. Identification of these subtypes provides a roadmap for patient stratification and trials of targeted therapies 11 . Nagarajan and colleagues analyzed whole-genome of two gastric adenocarcinomas, one with chromosomal instability and the other with microsatellite instability and revealed microsatellite instability-related mutational signatures 12 . Liu and colleagues discovered the spectrum of genomic and transcriptomic alterations in gastric cancer and ZAK kinase isoform 13 .
On cancer genome research, the high-quality samples are the critical factor. Since gastric cancer with PM was considered as an incurable stage at surgery before, so, it was nearly impossible to get primary tumor and paired PM tumor samples at the same time 14 . Recently, non-curative resection of PM has been introduced into treatment of gastric cancer with single peritoneal metastasis. Xia and colleagues proposed that the overall survival of patients in the non-curative resection group was longer (14.869 months) than that in the non-resection group (7.780 months). Non-curative resection of PM significantly prolonged survival of gastric cancer patients 15 . Along with the change of therapeutic mode, it becomes possible to get primary tumor sample and paired PM sample simultaneously. Here, we describe the clinicopathological features and whole-genome and transcriptome sequencing characterization on a set of precious samples from a paired gastritis, primary gastric cancer, peritoneal metastasis, as well as peripheral blood. Genome-wide screening of whole genome and transcriptome dysregulation between non-cancerous tissues, primary cancer and PM tumor would provide insights into the molecular basis of gastric cancer initiation, progression and metastasis.

Results
Clinicopathology of the case. The subject was diagnosed as gastric cancer occurring peritoneal metastasis before operation by computerized tomography (CT) scan of abdominal region as well as intraoperative observation (Fig. 1A,B). No other organs metastases except peritoneum were found before operation based on systematic examination. No neoadjuvant or adjuvant chemotherapy was administered before operation. Macroscopically, the tumor located in antrum with diameter 7 centimeters, Borrmann III type. The histological examination after total gastrectomy and noncurative resection of metastatic tumor demonstrated the tumor extensively infiltrates in gastric wall with penetration of serosa. Tumor cells revealed signet ring cell histology (Fig. 1C, middle). The PM tumor also disclosed signet ring cell histology (Fig. 1C, down). The non-cancerous mucosa revealed chronic gastritis histology (Fig. 1C, top). The final diagnosis was poor-differentiated gastric cancer, diffuse type, with PM. There was significant metastasis in perigastric lymph nodes (18/28). The patient was classified as stage IV, a late stage of gastric cancer. The patient died one month after operation.
Whole genome sequencing (WGS). Sequencing of the matched peripheral blood, gastritis, primary cancer and PM cancer was performed. The sequencing yielded average 167.75 Gb for above samples. The reads were mapped to the reference sequences of human genome hg19 and covered about 99.08% of the reference genome with mean 57× (range 34-80× )sequencing depth. The details of WGS were listed in Supplementary table 1.
Taking the blood sample as normal reference, we identified somatic alterations throughout the genome, all of which are single nucleotide variants (SNVs). The somatic alterations were not only found in primary cancer and PM cancer, but also in chronic gastritis mucosa far from primary tumor (Supplementary  table 2 were simultaneously observed in triple samples of chronic gastritis, primary cancer and PM cancer. Four somatic variations (RP1L1, PRB1, HS6ST3 and DCTN1) were simultaneously occurred in both primary tumor and PM cancer. One somatic variation (ARMC4) was only observed in PM cancer ( Fig. 2A). In terms of nucleotide substitution, the average proportion of transitions in gastritis, primary and metastatic cancer was 58.8%, 60.1% and 57.0%. The most prevalent changes were A → G/T → C transitions, followed by G → A/C → T transitions and A → C/T → G transitions (Fig. 2B). By comparing the transitions with TCGA dataset, the results show the prevalent changes are similar to other cancer types.
Effect of nonsynonymous coding SNVs on gene expression. As shown in heatmap plot of Fig. 2C, some mutated genes resulted in down-regulation of gene expression, and some resulted in up-regulation of gene expression. In order to explore the possible predicting biomarkers of PM, we specially concerned with the mutations simultaneously occurred in both primary cancer and PM cancer. Among them, the somatic mutation of RP1L1 and PRB1 caused gene activation with elevated mRNA expression in primary cancer, PM cancer or in both. They are noticeable molecular targets for tumor metastasis. In addition, gene TUBB6 is mutated in primary cancer, accompanied by highly expressed TUBB6 mRNA in both primary cancer and metastatic cancer relatively to gastritis tissue (Fold change > 6). Regard to PM specific nonsynonymous mutation, ARMC4 mutation resulted in the decreased level of mRNA expression.
Analysis of fusion transcripts. Base on the fusion detecting module of TopHat, we searched the gene fusion in each sample. Several potential gene fusions were found and the circos plot was used to present the existence and locations of gene fusion along the chromosomes (Fig. 3). We excluded the fusions that the distances of two genes were less than 100 kb, since these kinds of fusions may be artificial. We listed fusion genes of cancer samples and their altered mRNA expression levels compared to gastritis in Table 1  Differential expressed genes and pathways. We analyzed differentially expressed genes upon FDR < 0.1 and fold-change > 1.5. We compared the differences of primary cancer vs chronic gastritis, metastatic cancer vs chronic gastritis, metastatic cancer vs primary cancer, separately. The overview of the accounts of differential expressed genes for each sample was shown in Fig. 4. To find the crucial gene cluster that plays a key role in PM of gastric cancer, we paid attention to the shared genes both in primary cancer and metastatic cancer and listed them in Table 2. We noticed 6 genes (SFRP4, NOX4, HOXA11, NKX2-5, CDH16 and LOC100505875) were special up-regulated both in primary cancer and PM cancer, and 16 genes (LIPF, NKX6-2, MIXL1, CWH43, SULT1E1, CXCL5, REG1A, GHRL, NKX2-2, HTR1E, HPGD, ESRRG, CYP2C19, ADH1C, PNLIPRP3 and CEACAM5) were special down-regulated both in primary cancer and PM cancer. Using Gene Ontology (GO) functional annotation, signaling pathways was analyzed on the shared genes occurred in both primary cancer and PM. Compared to chronic gastritis, there was significant down-regulation of epithelium morphogenesis, secretion and muscle development-related genes both in primary cancer and PM cancer. Whereas, compared to chronic gastritis, there was significant up-regulation of genes related to response to bacteria, response to ethanol, response to stimulus, chemotaxis and glucose metabolism both in primary cancer and PM cancer.

Discussion
Here we report comprehensive characteristics of whole genome and transcriptome of a typical case of diffuse type gastric carcinoma with PM. Diffuse type gastric carcinoma accounts for about 40% of all gastric cancers, which is characterized by extensive poorly cohesive cells infiltration in stomach or metastatic sites. Diffuse type gastric carcinoma frequently develops into PM, consisting of peritoneal dissemination of cancer cells, and leads to extremely poor prognosis 16,17 . Although several molecules have been reported to be involved in the PM of gastric cancer, the mechanisms underlying poorer biological behavior have yet to be elucidated. Thus, systematic analysis of genomic and transcriptomic variant profiling of PM is essential. Recently, we collected a precious case, which covers matched peripheral blood, non-cancerous mucosa, primary cancer and PM cancer. WGS and RNA-seq were used to screen high-risk genes variations for occurring PM in gastric cancer. To our knowledge, this is first genomic sequencing study simultaneously for primary cancer and PM cancer worldwide.
At WGS level, we found a set of nonsynonymous mutation genes occurred in chronic gastritis sample, they are ATXN3, PLIN4, PDZD2, MUC4, MUC17, DMBT1, DAB1, ZNF208, FLG2 and CRNN. It means that somatic gene variations are an accumulative molecular event during early stage of gastric carcinogenesis. In order to find out predictive molecular markers for PM of gastric cancer, we pay attention to the mutation genes simultaneously occurred in both primary cancer and PM cancer. For instance, the somatic variations of RP1L1, PRB1, HS6ST3 and DCTN1 were simultaneously observed in both primary tumor and PM cancer. They are very noticeable molecular targets for understanding peritoneal metastasis on gastric cancer. RPIL1 is a retinitis pigmentosa 1-like1 gene, and no cancer-associated report yet. PRB1 encodes proline-rich protein BstNI subfamily 1. There is no cancer-associated report yet. HS6ST3 encodes heparan sulfate 6-O-sulfotransferase 3 and is implicated in proliferation, differentiation, adhesion, migration, inflammation, blood coagulation, and other diverse processes. HS6ST3 was found highly expressed in chondrosarcomas 18 . DCTN1 encodes the dynactin, which binds to both microtubules and cytoplasmic dynein and is involved in diverse cellular functions. There is no cancer-associated report for DCTN1 yet. ARMC4 is firstly identified PM-specific somatic variation. There is also no cancer-associated report for ARMC4. In current case, the frequency of somatic variation in primary cancer is higher than that in metastatic cancer. This may reflect the higher heterozygosity of tumor cells in primary tumor, while the metastatic fraction may come from a subclone of primary tumor cells. RNA-seq is employed for measuring global gene expression in order to determine the impact of genetic variants on gene functions. We integrated the WGS with RNA-seq together and found that some nonsynonymous mutations could cause elevation of gene activity with up-regulation of mRNA expression (RP1L1, PRB1 and TUBB6). Although there is no cancer-related report for RP1L1 and PRB1 yet, TUBB6 was found as a candidate oncogene in colorectal carcinoma 19,20 . Some nonsynonymous mutations caused low levels of gene activity with down-regulation of mRNA expression (ZNF208, CRNN, IREB2 and ACADL), which may be crucial tumor suppressor genes for gastric cancer and may contribute to the extensive dissemination of cancer cells in abdominal cavity.
Furthermore, we filtered the intra-chromosome short distance (less than 100 kb) fusion and found out a set of fusion genes. Based on RNA-seq, we noticed some fusion genes accompanied alteration of mRNA expression levels. However, by Sanger sequencing validation, the positive validation result of fusion gene was got only in GPX4-MPND fusion gene pair. The gene fusion of GPX4-MPND causes up-regulations of mRNA expression of both genes in primary cancer. Up-to-data, there is no any report on GPX4-MPND gene fusion in cancer. GPX4 (ID:2879, 19p13.3) gene encodes a member of the glutathione peroxidase family (22.5KD) and catalyzes the reduction of hydrogen peroxide, organic hydroperoxide, and lipid peroxides by reduced glutathione. GPX4 plays a role on cell protection against oxidative damage. The association of GPX4 with the human cancer is controversial. Some authors proposed that GPX4 is decreased in cancer tissues, but others found a diverse function of GPX4 on cancer [21][22][23] . MPND gene (ID: 84954, 19p13.3) was cloned from retinoblastoma and encodes a 48.5 KD protein of deubiquitinase family. Up-to-date, it is short of report for its correlation with carcinogenesis. We speculate that increased levels of GPX4 and MPND in primary cancer may facilitate tumor growth and progression. Actually, we presented the fact that the changes of mRNA expression could occur in both genes of the fusion gene pair, but not only in the down-stream gene of the fused pair.
Upon RNA-seq, we identified a set of differentially expressed genes in different sites. We especially noticed the shared genes between primary cancer and metastatic cancer. We considered they are noticeable molecular targets for gastric cancer with PM. Among 22 shared genes, SFRP4, NOX4, HOXA11, NKX2-5, CDH16 and LOC100505875 were up-regulated in primary cancer and PM cancer, and LIPF, NKX6-2, MIXL1, CWH43, SULT1E1, CXCL5, REG1A, GHRL, NKX2-2, HTR1E, HPGD, ESRRG, CYP2C19, ADH1C, PNLIPRP3 and CEACAM5 were down-regulated in both sites. SFRP4 encodes a secreted frizzled-related protein 4, and act as soluble modulators of Wnt signaling. Researchers found that SFRP4 was overexpressed in colorectal cancer 24 . NOX4 encodes a member of the NOX-family of enzymes that functions as the catalytic subunit the NADPH oxidase complex. NOX4 protein is localized to non-phagocytic cells where it acts as an oxygen sensor and catalyzes the reduction of molecular oxygen to reactive oxygen species. Hiraga and coworkers reported that NOX4 was up-regulated in pancreatic cancer, and contributed to TGF-beta-induced epithelial-mesenchymal transition 25 . Zhang and coworkers found that NOX4 was up-regulated in non small cell lung cancer, and promoted its growth and metastasis 26 . HOXA11 encodes a homeobox transcription factor, which may regulate gene expression, morphogenesis, and differentiation. Cui and colleagues found that there were epigenetic changes of HOXA11 in gastric cancer 27 . NKX2-5 encodes a homeobox transcription factor and functions in heart formation and development. Moussa and Sidhom proposed that NKX2-5 was partially up-regulated in T-cell acute lymphoblastic leukemia 28 . CDH16 encodes a calcium-dependent membrane-associated glycoprotein and functions as the principal mediator of homotypic cellular recognition. It plays a role in morphogenic direction of tissue development. Di Martino and coworkers found that CDH16 is one of the downstream targets of FGFR3, and involved in regulating cell-cell and cell-matrix adhesion in urothelial cells 29 .
In conclusion, this is the first report that WGS and RNA-seq are used for a set of matched samples from a gastric cancer case. The shared somatic mutations of primary cancer and metastatic cancer, as well as PM-specific somatic mutation are found out. Some mutations result in an activation of genes accompanied by up-regulation of gene expression; whereas, others result in an inactivation of genes, accompanied by down-regulation of gene expression. The former may be promise predictive markers for peritoneal metastasis of gastric cancer, while the latter may be important metastasis suppressor gene in gastric cancer. GPX4-MPND fusion gene is a novel molecular event identified in gastric cancer. This gene fusion may activate the gene pair and facilitates cancer growth and invasion.  In short, RNA concentration was measured by Nanodrop and the quality was measured by agarose and Agilent 2100. RNA passed QC will begin lib prep. Purifying the poly-A containing mRNA molecules using poly-T oligo-attached magnetic beads. Following purification, the mRNA is fragmented into small pieces using divalent cations under elevated temperature. The cleaved RNA fragments are copied into first strand cDNA using reverse transcriptase and random primers. This is followed by second strand cDNA synthesis using DNA polymerase I and RNase H. These cDNA fragments then go through an end repair process, the addition of a single ' A' base, and then ligation of the adapters. The products are then purified and enriched with PCR (15-cycle) to create the final cDNA library. After purification, quantification and validation, validated DNA libraries were sequenced on Illumina Sequencing System (HiSeq2000) following the manufacturer's standard workflow.

Analysis of differentially expressed genes and fusion transcripts.
The samples from non-cancerous mucosa, primary cancer and peritoneal metastasis yielded enough sequencing data of high quality by analytical QC. Transcript assembly and abundance estimation were performed to get the gene expression level. RNA-Seq reads were mapped to the human genome using TopHat (version 2.0.9, reference hg19). Cufflinks software (version 2.1.1) was used to determine the differentially expressed genes. The transcript counts for gene expression levels were calculated, and the relative transcript abundance was determined as fragments per kilo base of exon per million fragments mapped (FPKM). Raw data were extracted as FPKM values across all samples, and samples with zero values across more than 50% of the genes were excluded .Besides, detection of fusion transcripts resulting from chromosomal rearrangements was also performed for each sample. The genomic variations (SNP and INDEL) at RNA level were accessed and annotated against a collection of comprehensive functional annotation databases, including gene/protein structure, somatic variations (dbSNP, 1000 Human Genome Project, GWAS), functional consequence of amino acid change (VISIFT), known somatic mutations (COSMIC), and functional elements (transcription binding sites, conserved elements).
Gene Ontology and Pathway Enrichment Analysis. In order to better understand the function of variant genes, we conducted a gene set enrichment analysis (GESA) for quality-controlled, filtered expression data, which conduct enrichment test genes(KEGG) biological pathways, and gene ontology (GO). Generally, a gene set with FDR < 0.1 and fold-change > 1.5 was considered significantly enriched. We used gene ontology system, which covers three domains: biological processes, molecular functions and cellular components, to annotate the genes.
RT-PCR. The selected mutated genes were further studied to validate the genomic sequencing results by quantitative RT-PCR. The genomic DNA from blood non-cancerous mucosa, primary and peritoneal metastatic tumors was used as PCR templates. PCR primers were designed for each somatic mutation using NCBI primer-blast (http://www.ncbi.nlm.nih.gov/tools/primer-blast/). Quantitative RT-PCR was performed using a routine method in our laboratory. The PCR products were sent to Sangon (Shanghai, China) for Sanger capillary sequencing. All sequencing results were aligned and visualized using chromas (http:// technelysium.com.au/). All the methods were carried out in accordance with the approved guidelines.