Introduction

Globally, gastric cancer (GC) is the fifth most common and one of the leading cancers that cause mortality. In 2012, there were 952,000 cases diagnosed, resulting in an estimated 723,000 annual deaths1. Gastric cancer represents a poor prognosis, and the total 5-year survival is less than 30%, despite treatment with surgery2. Lymphatic metastasis was expected to be associated with poor outcomes, and in poorer stages, more distant lymph nodes can be revealed through histopathology finding3,4. More accurate diagnosis and precision therapy are the priorities of current clinical and fundamental research in gastric cancer. Based on distinct molecular subtypes in The Cancer Genome Atlas (TCGA) Network, gastric cancer is now regarded to have specific genomic abnormalities and targeted therapies5.

Biologists and clinicians are faced with many challenges, including gastric neoplasm metastasis. Genomic analysis revealed a series of significant actionable mutation loads or pathways in gastric cancer, such as PI3K/AKT/mTOR, CLD18, and HER2/EGFR, which are likely to induce primary gastric cancer to develop into metastases6,7. Recently, transcriptomic data have revealed that the RhoA pathway is involved in the invasion and migration of the ‘diffuse’ growth phenotype in gastric cancer8. Metastatic cascades have been depicted by several steps, including the dissemination of circulating cells, adhesion to blood vessel endothelial cells and proliferation9. However, the mechanism of gastric carcinoma lymph node metastasis remains unknown, partly because data from metastasis studies were generated with the bulk approach, which were likely to mask the roles of subpopulations. Therefore, heterogeneity caused by diverse tumour cell subsets and complex microenvironments has been a great challenge in diagnosis and treatment.

Single-cell RNA-seq (scRNA-seq) deciphers tissues into individual cells to distinguish neoplastic from nontumourous cells and to profile expression patterns to infer subclones10. ScRNA-seq can be used to analyse metastatic cancer cells whose bulk-level expression profiles are affected by metastasized local tissue11,12,13. In the study of GC scRNA-seq, Li et al. found that some specific marker genes, including SLC11A2, KLK7 and SULT2B1, were related to the development of early GC cells14. Single-cell gene expression studies revealed widespread changes in cell numbers, transcriptional status, and intercellular interactions in the GC tumour microenvironment15. ScRNA-seq can be employed in analysing different degrees of GC, which is a potential good predictor of GC prognosis16. These studies have paid attention to the tumour heterogeneity of GC but lack research on GC metastasis. To answer questions from the GC intratumoural perspective in metastasis, we performed single-cell level analysis on three GC patients with primary cancer and paired metastatic lymph node cancer tissue using the scRNA-seq approach.

Materials and methods

Experimental design

The experiment was performed by comparative sequencing analysis using scRNA-seq from the primary tumour tissue (TT) and paired lymph node (LN) metastasis tumour tissue in three gastric cancer patients. The clinical characteristics of each patient used in this study are shown in Table 1. Tumour tissues were obtained from Changhai Hospital affiliated to Second Military Medical University during operations. The study was approved by the Ethical Committee of Changhai Hospital (CHEC2016-157). Informed consent was written by each of the patients and their guardians, and all procedures were conducted per the Helsinki Ethical Principles for Medical Research. All libraries were prepared with the Smart-seq2 scRNA-seq protocol and sequenced on a HiSeq2500 instrument with 50 bp single-end sequencing mode (Fig. 1).

Table 1 Clinical characteristics of each patient used in the scRNA-seq study and the cell number of each sample after quality control.
Figure 1
figure 1

(a) Overview of the study design and sampling protocol. (b) Analysis pipeline in the current single-cell RNA-seq study.

Solid tumour decomposition and single cell isolation simulation

Biopsy or metastatic tumour were dissected and transferred to a 2 ml tube (Axygen, China), each containing 1 ml prewarmed M199 media (Thermo Fisher Scientific, USA), 2 mg/ml collagenase P (Roche, USA) and 10 U/µl DNase I (Roche, USA) as described by Tirosh et al.17. Tissues were digested for 60 min at 37 °C and then pipetted up and down every ten times every 10 min. The tissue suspensions were then filtered with a 70 µm nylon mesh (Thermo Fisher Scientific, USA) and centrifuged at 450g for 5 min. Pellets were resuspended for live cell staining using CFSE incubation for 5 min.

Single-cell whole-transcriptome library preparation and sequencing

Single cells from each tissue were manually picked under fluorescence microscopy (X71, Olympus, Japan) using a mouth pipette. Each of the harvested single cells was transferred into 2 µL of cell lysis buffer (CLB) in 0.2 mL PCR tubes. Libraries of isolated single cells were then prepared as per the Smart-seq2 protocol18 with modifications on reverse transcription and amplification cycles.

Oligo-dT primed RT (reverse transcription) was performed by Smartscribe (Takara, Japan) reverse transcriptase and locked TSO oligonucleotide (Exiqon, Danmark) upon single cells. Full-length cDNA amplification was conducted by PCR amplification for 22 cycles with Hifi HotStart ReadyMix reagent (KAPA Biosystems, USA) and purified by 0.6 × AMPure beads (BD, USA). Barcoded libraries were fragmented and segmented with a Library Prep kit (Nextera XT, Illumina, USA). Pooled libraries with unique N5–N7 barcodes were sequenced with a HiSeq 2500 sequencer (Illumina, USA) and a 50 SE read flow cell.

ScRNA-seq data analysis

Sequencing adapters and low-quality reads were first trimmed and removed using Trimmomatic19. Reads with a Phred score below 20 and trimmed sequence lengths less than 18 bp were discarded. The remaining high-quality reads were mapped to the human genome using the HiSat2 tool20 by using the human genome UCSC hg19 as a reference (ftp://genome-ftp.cse.ucsc.edu/goldenPath/hg19/chromosomes/) with a total of 22,335 genes. FeatureCounts software21 was used for the expression calculation of each gene, and raw count values of genes in each sample were obtained. A gene that was considered to be expressed in a sample had one more count in the sample. Read counts were normalized to TPM (transcript per million) values and then log2 transformed by using the “newSCESet” function of “scater” (https://github.com/davismcc/scater) package by R (https://www.r-project.org/).

Analyses, including principal component analysis (PCA), Pearson correlation, Student’s t-test and hierarchical clustering analysis (HCA), were performed using functions in R as follows: ‘prcomp’, ‘cor’, ‘t.test’ and ‘cluster’ in the ‘stats’ package and Heatmap in the ‘ComplexHeatmap’ package. The “ggplot” was used for the visualization of PCA.

Differentially expressed genes (DEGs) were calculated with fold-change and p-value between "treatment" and control groups. We set the fold change by a twofold cut-off, and FDR-adjusted p < 0.05 was regarded as the criterion for DEGs. This was carried out by using the “stat” package. Gene Ontology (GO) analysis results were obtained based on the Metascape (http://metascape.org)22.

Single-cell trajectory analysis. We used TSCAN, diffusion map, and monocle2 to perform pseudotime trajectory analysis for the evolution of gastric cancer cells. Cells were chosen based on Seurat cluster identification results.

Immunofluorescence

GC tumour tissues were embedded in paraffin and sliced by Servicebio (Shanghai, China). The antibodies were purchased from Abcam (Abcam, Cambridge, UK), including Anti-ERBB2 and Anti-Oligodendrocyte Specific Protein (CLDN11).

Results

Sequencing data processing and QC

After filtration with a per-gene average read > 1 across all samples, 94 out of 171 samples passed quality control (Table 1). A total of 7601 genes passed filtration and were adopted in further analysis. Each cell was sequenced with 20,000 ~ 200,000 uniquely mapped reads, which is sufficient to detect distinct subpopulation expression profiles17,23,24,25. Correlations between individual tumour cells from different samples showed a broad range of Pearson coefficients (r = − 0.1 ~ 0.98), implying prominent transcriptomic heterogeneity. However, despite the heterogeneity across the cells, most of the samples were clustered according to their tissue of origin (Figure S1).

Clustering of the primary tumour and metastatic tumour

T-SNE was plotted to present the distribution of the single cells from the primary tumour and metastatic tumour in lower dimensions. Primary and metastatic tumour subgroups were partly merged (Fig. 2). Unsupervised T-SNE showed the separation of the primary tumour and metastatic tumour cell groups. In terms of tumour tissues, removal of nontumourous cells resulted in distinct patient-specific cancer heterogeneity. ScRNA-seq revealed specific carcinoma subpopulations and their characteristics from each patient. However, diverse microenvironmental populations were shared by the different patients, and nonmalignant cells did not show clustering of any specific subgroups.

Figure 2
figure 2

(a) T-SNE was plotted to present the distribution of the single cells from three patients in all primary tumour tissues. (b) Unsupervised T-SNE showing the separation of carcinoma cell groups. (c) In terms of gastric cancer tumour tissues, removal of noncarcinoma cells reveals intrinsic patient-specific tumour cell heterogeneity. (d) More randomly dispersed dots are shown in the metastatic tumour cells.

Intratumoural heterogeneity analysis

In terms of the intratumoural heterogeneity, the correlation analysis of single cells revealed heterogeneity within tumours across three patients (Fig. 3). Using bulk stemness, immune, stromal, and tumour scoring assessments, we found significant tumour and stromal scoring differences between primary tumour and metastatic tumour single cells, indicating compositional and functional changes in tumours. Population-wide comparison between TT and LN single cells revealed that NOTCH2, NOTCH2NL, KIF5B, and ERBB4 are highly expressed in primary cancer, while CDK12, ERBB2, and CLDN11 are overexpressed in metastatic cancer. The decomposition of four main principal components (PCs) in the datasets is shown in Fig. 4.

Figure 3
figure 3

Intratumoural heterogeneity analysis. Correlation analysis of single cells revealed heterogeneity within tumours across three patients. Robust bulk stemness, immune, stromal, and tumour scoring assessment between tumour and paratumour single cells.

Figure 4
figure 4

Tissue-specific markers. (a) Population-wide comparison between TT and LN single cells. Tissue-specific markers were calculated, and a heatmap was plotted using the top 100 highly expressed features based on previously defined clusters. NOTCH2, NOTCH2NL, KIF5B, and ERBB4 are highly expressed in primary cancer, while CDK12, ERBB2, and CLDN11 are overexpressed in metastatic cancer. Functional annotation revealed microtubule movement, and notch-based signalling was activated in the primary cells, indicating its metastatic propensity. (b) Decomposition of six main principal components (PCs) in the datasets. (c) IF of ERBB4 and CLDN11 in TT and LN.

ScRNA-seq analysis and trajectory analysis of cell clusters

Seurat marker analysis revealed four main clusters in the overall single cells. Twelve significant principal components were extracted to identify four main clusters in the tumour tissues. The heatmap indicates markers highly expressed in each cluster. Functional annotations of each cluster are shown based on Seurat-calculated markers (Fig. 5).

Figure 5
figure 5

Seurat marker analysis revealed four main clusters in the overall single cells. Twelve significant principal components were extracted to identify four main clusters in the tumour tissues. (a) Four main clusters in the tumour tissues using TSNE. (b) Heatmap indicating markers highly expressed in each cluster. (cf) Functional annotations of each cluster are shown based on Seurat-calculated markers.

The pseudotime trajectory of GC clusters revealed a distinct pattern of postulated evolution state from Cluster0 > 2 > 1. The major genes (TOP1000) driving evolution were mainly involved in SRP-dependent cotranslational protein targeting to the membrane, response to the metal ion, and ribosome assembly. The kernel genes in evolution regulation include SERPINB13, NFKBIA, B2M, and RPL24. Transcription factors including FOS, FOSB, JUN, JUNB, and ZNF256 drive the regulatory networks (Fig. 6).

Figure 6
figure 6

Gastric-derived cell evolutionary trajectory. (a) The pseudotime trajectory of GC clusters revealed a distinct pattern of postulated evolution state from Cluster 0 > 2 > 1. (b) The evolutionary trajectory of TT and LN cells. (c,d) Evolution trajectory-based functional annotation (Top1000 gene). (e) Regulatory co-network of kernel genes in evolution regulation and a transcription factor-driven regulatory network.

Stem cell markers were applied to validate the multiple GC origin hypotheses. Based on canonical markers, we identified hepatocytes and macrophage cells using TSNE. Using TSCAN, we postulated a gastric-derived cell evolutionary trajectory. SLICER, TSCAN, and diffusion map pseudotime tools show the evolutionary trajectory of four tumour cell clusters (Figure S2).

Discussion

Tumour heterogeneity in gastric tumorigenesis and progression has recently attracted researchers’ attention. Based on some significant findings and theories of heterogeneity, target and immune therapies are in progress. However, there remains largely unknown heterogeneity in gastric cancer. By using single-cell isolation aided scRNA-seq, we could clearly identify the signatures of the primary tumour and metastatic tumour and reveal the role of heterogeneity, which causes metastasis in gastric cancer. Gastric carcinoma is common in lymph node metastasis, but the mechanism remains unknown. In our study, we revealed a subgroup of cells bridging the metastatic group and primary group, implying the transition state of cancer during the metastatic process by analysis of three patients’ transcriptomic data of single cells from primary gastric tumours and lymph node metastasis tumours.

Cancer heterogeneity has been shown to be a great challenge in cancer diagnosis and treatment26. Recently, scRNA-seq has been able to analyse abnormal cell-to-cell interactions, chemotherapy resistance, and immunosuppressive microenvironments from primary tissues or CTCs. For example, Chun et al. separated tumour cells and immune cells from primary breast cancer cells27. Patel et al. revealed unanticipated heterogeneity in primary glioblastoma, showing diverse regulatory signalling and therapy programmes28. Kim et al.’s scRNA-seq results combined both intratumoural SNV KRASG12D and expression heterogeneity of lung adenocarcinoma cells, deciphering subpopulations in anticancer drug responses29. In terms of the intratumoural cancer cell component. Our clustering data showed significant intratumoural heterogeneity.

Tissue-specific markers were calculated, and a heatmap was plotted using the top 50 highly expressed features based on six redefined clusters using cell markers. Tissue-specific markers were calculated, and a heatmap was plotted using the top 100 highly expressed features based on previously defined clusters. NOTCH2, NOTCH2NL, KIF5B, and ERBB4 are highly expressed in primary cancer, while ERBB2, CLDN11 and CDK12 are overexpressed in metastatic cancer. Previous studies suggested that the expression of Notch signalling pathway-associated proteins, such as Notch2, was significantly elevated in gastric cancer tissues compared to normal tissues30. In addition, evidence showed that higher KIF5B and ERBB4 promoted cancer cell proliferation31,32. Evidence suggests that CDK12, ERBB2, and CLDN11 play an associated metastatic role in cancer33,34,35. Several studies have shown that CLDN11 is related to tumour migration and metastasis35. Although the aetiologies of gastric cancer are partly clear and validated by experiments, the proposed “seed”-and- “soil” hypothesis has yet to be well explained36. Our findings implied that lymph node metastasis-prone subclones are more likely to share CLDN11, which is a member of the tight junction protein family that functions as a component of cell adhesion37. This explained cancer cell colonization in the lymph node after migrating from primary tissues. Future studies need to evaluate both transcriptomic and genetic alterations and even geographical information across different regions from primary metastatic tumours only to identify survival and evolution pressure in the cancer-related microenvironment, which promotes the identification of potential key drivers of gastric cancer.

TFs are proteins with special structures and functions that regulate gene expression. We noticed that several TF-regulating genes, including FOS and JUN, appeared to be particularly important during tumour evolution. JUN and FOS were determined to be critical genes related to GC38. ScRNA-seq reveals an expression pattern with high FOS and JUN at leukaemia evolution, which resolves following therapy but reoccurs following relapse and death39.

As this is a preliminary study, the limitation of our analysis is the small number of patient cases enrolled in this study. Nonetheless, we obtained a more comprehensive picture of gastric cancer lymph node metastasis at single-celled resolution, giving a new perspective on the biomarkers (ERBB2, CLDN11 and CDK12) involved in metastasis, pathways involved and driver genes (FOS and JUN) during the metastasis process, providing a basis for the treatment of GC.