Comprehensive analysis of metastatic gastric cancer tumour cells using single-cell RNA-seq

Gastric cancer (GC) is a leading cause of cancer-induced mortality, with poor prognosis with metastasis. The mechanism of gastric carcinoma lymph node metastasis remains unknown due to traditional bulk-leveled approaches masking the roles of subpopulations. To answer questions concerning metastasis from the gastric carcinoma intratumoural perspective, we performed single-cell level analysis on three gastric cancer patients with primary cancer and paired metastatic lymph node cancer tissues using single-cell RNA-seq (scRNA-seq). The results showed distinct carcinoma profiles from each patient, and diverse microenvironmental subsets were shared across different patients. Clustering data showed significant intratumoural heterogeneity. The results also revealed a subgroup of cells bridging the metastatic group and primary group, implying the transition state of cancer during the metastatic process. In the present study, we obtained a more comprehensive picture of gastric cancer lymph node metastasis, and we discovered some GC lymph node metastasis marker genes (ERBB2, CLDN11 and CDK12), as well as potential gastric cancer evolution-driving genes (FOS and JUN), which provide a basis for the treatment of GC.


Solid tumour decomposition and single cell isolation simulation. Biopsy or metastatic tumour
were dissected and transferred to a 2 ml tube (Axygen, China), each containing 1 ml prewarmed M199 media (Thermo Fisher Scientific, USA), 2 mg/ml collagenase P (Roche, USA) and 10 U/µl DNase I (Roche, USA) as described by Tirosh et al. 17 . Tissues were digested for 60 min at 37 °C and then pipetted up and down every ten times every 10 min. The tissue suspensions were then filtered with a 70 µm nylon mesh (Thermo Fisher Scientific, USA) and centrifuged at 450g for 5 min. Pellets were resuspended for live cell staining using CFSE incubation for 5 min.
Single-cell whole-transcriptome library preparation and sequencing. Single cells from each tissue were manually picked under fluorescence microscopy (X71, Olympus, Japan) using a mouth pipette. Each of the harvested single cells was transferred into 2 µL of cell lysis buffer (CLB) in 0.2 mL PCR tubes. Libraries of isolated single cells were then prepared as per the Smart-seq2 protocol 18 with modifications on reverse transcription and amplification cycles.
Oligo-dT primed RT (reverse transcription) was performed by Smartscribe (Takara, Japan) reverse transcriptase and locked TSO oligonucleotide (Exiqon, Danmark) upon single cells. Full-length cDNA amplification was conducted by PCR amplification for 22 cycles with Hifi HotStart ReadyMix reagent (KAPA Biosystems, USA) and purified by 0.6 × AMPure beads (BD, USA). Barcoded libraries were fragmented and segmented with a Library Prep kit (Nextera XT, Illumina, USA). Pooled libraries with unique N5-N7 barcodes were sequenced with a HiSeq 2500 sequencer (Illumina, USA) and a 50 SE read flow cell. www.nature.com/scientificreports/ ScRNA-seq data analysis. Sequencing adapters and low-quality reads were first trimmed and removed using Trimmomatic 19 . Reads with a Phred score below 20 and trimmed sequence lengths less than 18 bp were discarded. The remaining high-quality reads were mapped to the human genome using the HiSat2 tool 20 by using the human genome UCSC hg19 as a reference (ftp://genom e-ftp.cse.ucsc.edu/golde nPath /hg19/chrom osome s/) with a total of 22,335 genes. FeatureCounts software 21 was used for the expression calculation of each gene, and raw count values of genes in each sample were obtained. A gene that was considered to be expressed in a sample had one more count in the sample. Read counts were normalized to TPM (transcript per million) values and then log2 transformed by using the "newSCESet" function of "scater" (https ://githu b.com/davis mcc/ scate r) package by R (https ://www.r-proje ct.org/). Analyses, including principal component analysis (PCA), Pearson correlation, Student's t-test and hierarchical clustering analysis (HCA), were performed using functions in R as follows: 'prcomp' , 'cor' , 't.test' and 'cluster' in the 'stats' package and Heatmap in the 'ComplexHeatmap' package. The "ggplot" was used for the visualization of PCA.
Differentially expressed genes (DEGs) were calculated with fold-change and p-value between "treatment" and control groups. We set the fold change by a twofold cut-off, and FDR-adjusted p < 0.05 was regarded as the criterion for DEGs. This was carried out by using the "stat" package. Gene Ontology (GO) analysis results were obtained based on the Metascape (http://metas cape.org) 22 .
Single-cell trajectory analysis. We used TSCAN, diffusion map, and monocle2 to perform pseudotime trajectory analysis for the evolution of gastric cancer cells. Cells were chosen based on Seurat cluster identification results.

Results
Sequencing data processing and QC. After filtration with a per-gene average read > 1 across all samples, 94 out of 171 samples passed quality control (Table 1). A total of 7601 genes passed filtration and were adopted in further analysis. Each cell was sequenced with 20,000 ~ 200,000 uniquely mapped reads, which is sufficient to detect distinct subpopulation expression profiles 17,[23][24][25] . Correlations between individual tumour cells from different samples showed a broad range of Pearson coefficients (r = − 0.1 ~ 0.98), implying prominent transcriptomic heterogeneity. However, despite the heterogeneity across the cells, most of the samples were clustered according to their tissue of origin ( Figure S1).
Clustering of the primary tumour and metastatic tumour. T-SNE was plotted to present the distribution of the single cells from the primary tumour and metastatic tumour in lower dimensions. Primary and metastatic tumour subgroups were partly merged (Fig. 2). Unsupervised T-SNE showed the separation of the Intratumoural heterogeneity analysis. In terms of the intratumoural heterogeneity, the correlation analysis of single cells revealed heterogeneity within tumours across three patients (Fig. 3). Using bulk stemness, immune, stromal, and tumour scoring assessments, we found significant tumour and stromal scoring differences between primary tumour and metastatic tumour single cells, indicating compositional and functional changes in tumours. Population-wide comparison between TT and LN single cells revealed that NOTCH2, NOTCH2NL, KIF5B, and ERBB4 are highly expressed in primary cancer, while CDK12, ERBB2, and CLDN11 are overexpressed in metastatic cancer. The decomposition of four main principal components (PCs) in the datasets is shown in Fig. 4.

ScRNA-seq analysis and trajectory analysis of cell clusters. Seurat marker analysis revealed four
main clusters in the overall single cells. Twelve significant principal components were extracted to identify four main clusters in the tumour tissues. The heatmap indicates markers highly expressed in each cluster. Functional annotations of each cluster are shown based on Seurat-calculated markers (Fig. 5).
The pseudotime trajectory of GC clusters revealed a distinct pattern of postulated evolution state from Clus-ter0 > 2 > 1. The major genes (TOP1000) driving evolution were mainly involved in SRP-dependent cotranslational protein targeting to the membrane, response to the metal ion, and ribosome assembly. The kernel genes in evolution regulation include SERPINB13, NFKBIA, B2M, and RPL24. Transcription factors including FOS, FOSB, JUN, JUNB, and ZNF256 drive the regulatory networks (Fig. 6).
Stem cell markers were applied to validate the multiple GC origin hypotheses. Based on canonical markers, we identified hepatocytes and macrophage cells using TSNE. Using TSCAN, we postulated a gastric-derived cell evolutionary trajectory. SLICER, TSCAN, and diffusion map pseudotime tools show the evolutionary trajectory of four tumour cell clusters ( Figure S2).

Discussion
Tumour heterogeneity in gastric tumorigenesis and progression has recently attracted researchers' attention. Based on some significant findings and theories of heterogeneity, target and immune therapies are in progress. However, there remains largely unknown heterogeneity in gastric cancer. By using single-cell isolation aided scRNA-seq, we could clearly identify the signatures of the primary tumour and metastatic tumour and reveal the role of heterogeneity, which causes metastasis in gastric cancer. Gastric carcinoma is common in lymph node metastasis, but the mechanism remains unknown. In our study, we revealed a subgroup of cells bridging the metastatic group and primary group, implying the transition state of cancer during the metastatic process www.nature.com/scientificreports/ by analysis of three patients' transcriptomic data of single cells from primary gastric tumours and lymph node metastasis tumours. Cancer heterogeneity has been shown to be a great challenge in cancer diagnosis and treatment 26 . Recently, scRNA-seq has been able to analyse abnormal cell-to-cell interactions, chemotherapy resistance, and immunosuppressive microenvironments from primary tissues or CTCs.  29 . In terms of the intratumoural cancer cell component. Our clustering data showed significant intratumoural heterogeneity.
Tissue-specific markers were calculated, and a heatmap was plotted using the top 50 highly expressed features based on six redefined clusters using cell markers. Tissue-specific markers were calculated, and a heatmap was plotted using the top 100 highly expressed features based on previously defined clusters. NOTCH2, NOTCH2NL, KIF5B, and ERBB4 are highly expressed in primary cancer, while ERBB2, CLDN11 and CDK12 are overexpressed in metastatic cancer. Previous studies suggested that the expression of Notch signalling pathway-associated proteins, such as Notch2, was significantly elevated in gastric cancer tissues compared to normal tissues 30 . In addition, evidence showed that higher KIF5B and ERBB4 promoted cancer cell proliferation 31,32 . Evidence suggests that CDK12, ERBB2, and CLDN11 play an associated metastatic role in cancer [33][34][35] . Several studies have shown that CLDN11 is related to tumour migration and metastasis 35 . Although the aetiologies of gastric cancer are partly clear and validated by experiments, the proposed "seed"-and-"soil" hypothesis has yet to be well explained 36 . Our findings implied that lymph node metastasis-prone subclones are more likely to share CLDN11, which is a member of the tight junction protein family that functions as a component of cell adhesion 37 . This explained cancer cell colonization in the lymph node after migrating from primary tissues. Future studies need to evaluate both transcriptomic and genetic alterations and even geographical information across different regions from primary metastatic tumours only to identify survival and evolution pressure in the cancer-related microenvironment, which promotes the identification of potential key drivers of gastric cancer. www.nature.com/scientificreports/ TFs are proteins with special structures and functions that regulate gene expression. We noticed that several TF-regulating genes, including FOS and JUN, appeared to be particularly important during tumour evolution. JUN and FOS were determined to be critical genes related to GC 38 . ScRNA-seq reveals an expression pattern with high FOS and JUN at leukaemia evolution, which resolves following therapy but reoccurs following relapse and death 39 .
As this is a preliminary study, the limitation of our analysis is the small number of patient cases enrolled in this study. Nonetheless, we obtained a more comprehensive picture of gastric cancer lymph node metastasis at single-celled resolution, giving a new perspective on the biomarkers (ERBB2, CLDN11 and CDK12) involved in metastasis, pathways involved and driver genes (FOS and JUN) during the metastasis process, providing a basis for the treatment of GC.  www.nature.com/scientificreports/