scRNA-seq of gastric tumor shows complex intercellular interaction with an alternative T cell exhaustion trajectory

The tumor microenvironment (TME) in gastric cancer (GC) has been shown to be important for tumor control but the specific characteristics for GC are not fully appreciated. We generated an atlas of 166,533 cells from 10 GC patients with matched paratumor tissues and blood. Our results show tumor-associated stromal cells (TASCs) have upregulated activity of Wnt signaling and angiogenesis, and are negatively correlated with survival. Tumor-associated macrophages and LAMP3+ DCs are involved in mediating T cell activity and form intercellular interaction hubs with TASCs. Clonotype and trajectory analysis demonstrates that Tc17 (IL-17+CD8+ T cells) originate from tissue-resident memory T cells and can subsequently differentiate into exhausted T cells, suggesting an alternative pathway for T cell exhaustion. Our results indicate that IL17+ cells may promote tumor progression through IL17, IL22, and IL26 signaling, highlighting the possibility of targeting IL17+ cells and associated signaling pathways as a therapeutic strategy to treat GC.

that doublets were carefully removed.

Sun et al. have performed a detailed analysis of cells infiltrating the gastric cancer (GC)
tumor microenvironment, creating an atlas of >166,000 cells from 10 GC patients. Such cells were studied by scRNAseq, coupled with TCR sequencing. Among other findings, they report that tumor-associated macrophages and dendritic cells expressing LAMP3 were able to mediate T cell activity. Furthermore, they describe the role and function of IL-17 producing CD8+ T cells, a possible novel therapeutic target. The paper is interesting and technically well done; the statistical analysis and the presentation of data are adequate.

Comments:
1) Lines 603-605: "Cells expressing contradictory markers of known different cell types were removed as potential doublets". It would be better to indicate the markers used to identify the doublets and the number of excluded cells.
2) Lines 625-629: "… SCTransform .. was used to remove a batch effect… ". The figures 4a and 4c report that some clusters are tissue-dependent, like CD8-C1, CD4-C1, CD4-C2. Naïve CD8 (CD8-C1) and naïve CD4 (CD4-C1) are close in the UMAP space, and it has to be excluded that was due to some source of technical variability. This variability can also cause the finding of other clusters. Have you tried to regress out other source of variations, like the mitochondrial gene percentages (that likely could be different among peripheral cells and TIL)? Additionally, some supplementary UMAP plot colored by the number of gene (and mitochondrial gene) expressed by each cluster could help in understanding the results of clustering process.
3) Figure 4: The clusters CD4_C2_LTB and CD4_C3_SLC2A3 designated as memory likely have to express also CREB or S1004A. If possible, highlining these transcripts would be indicative. Then, for the CD4_C4_CD69 cluster we recommend reporting the expression of CXCR6 and ANXA1, considering that cells were indicated as TRM (see: Szabo, Peter A., et al. "Single-cell transcriptomics of human T cells reveals tissue and activation signatures in health and disease." Nature Communications 10.1 (2019): 1-16). (Fig.  6c)". Maybe using the percentage of overlapping TCR is not the optimal metrics to measure the shared clonotype among blood and solid tissue (or patient/cluster). The "Morisita similarity index", that also considers the size of the population, could be useful for this purpose. 5) Line 398: "we thus hypothesized two possible trajectories for T cell state transition (Fig. 6i)". This could be an interesting finding, but some additional plot showing the patient/sample effect would be indicative, and would greatly clarify if this observation is affected by the patient-effect rather than by the biological process.

A brief summary of the revision
We are very grateful to Dr. Andrea Cossarizza, Dr. Domenico Lo Tartaro and the two anonymous reviewers for their constructive criticism and comments on our study. In the revised manuscript, we made substantial improvements in the computational analysis and provided further experimental validations to address the reviewers' concerns.
Firstly, to provide information from additional sources to support our conclusion, we added whole-exome sequencing (WES) and bulk RNA-seq for the same samples and Last but not least, we improved a number of computational analyses according to the suggestions from the reviewers. Besides, we added a systematic evaluation of batch effects in Supplementary Note 1 as well as an evaluation of the robustness and necessity of T-cell clusters in Supplementary Note 2.

Reviewer #1 (Remarks to the Author):
The authors generated a valuable scRNA seq dataset of 166,533 cells from 10 gastric cancer (GC) patients with matched adjacent normal tissues and peripheral blood, and copresented paired TCR and BCR (T/B cell receptor) Fig. 3d). Fib_1 also expressed others CAF markers, such as MMP3 and MMP11, and inflammation-associated fibroblast markers (IL11, IL24) that promote carcinogenesis 24 ( Fig. 3c and Supplementary Fig. 3g). Genes in the Wnt signaling pathway such as, WNT2 and WNT5A, were upregulated in tumor fibroblasts while SFRP1, an inhibitor of Wnt signaling was downregulated (Fig. 3i). These genes also showed similar expression patterns in TCGA-STAD dataset (Fig. 3j). Note that these three genes were expressed almost exclusively by fibroblasts ( Supplementary Fig. 3e).
Besides, Fib_1 cells exhibited upregulation of the TWIST1-PRRX1-TNC positive feedback pathway, which is known to promote the activation and expansion of CAFs in the TME 25 . Meanwhile, bone morphogenetic protein 1 (BMP1) and ANGPT2, which respectively facilitate tumor growth and angiogenesis, were expressed at significantly higher levels in SMC_1. IHC results validated that the protein expression of FAP, BMP1, WNT5A were upregulated in tumor (Fig. 3h). Hereafter, we defined cells in the three tumor- In the future, we will continue to verify our hypothesis on the function of TASCs in tumor progression using patient-derived xenografts (PDXs) and genetically engineered mouse models.

Reply:
We apologize for not being clear in this part of the analysis. Both cancer cells and stromal cells can secret factors that stimulate angiogenesis, and hence, facilitate the development of cancer. The enrichment of angiogenesis in TASCs indicates that stromal cells may help establish an appropriate microenvironment for angiogenesis in GC by releasing various kind of factors. By calculating the angiogenesis score in epithelial and stromal cell clusters, we found that signature of angiogenesis is more pronounced in stromal cells than epithelial cell in general in GC (Fig. R2). Moreover, we found TASCs, including Endo_1, Fib_1 and SMC_1, had higher angiogenesis score than stromal cells enriched in paratumor tissues. Epithelial-mesenchymal transition (EMT) is an evolutionarily conserved developmental program that enables carcinoma cells to replace their epithelial features with mesenchymal features. Therefore, as you mentioned, it is hard to interpret the upregulated EMT pathway in TASCs. We have deleted this part in the revised manuscript.
We apologize for the misleading statements. investigate the role of these TFs in TASCs in more detail. Our work aims to illustrate the complex biological ecosystem of GC by using scRNA-seq data. We recognized our limitations and discussed this in the discussion section of the article in the revised manuscript (Page 26, lines 535-545). interactions. The TME has not been well studied in gastric cancer and this study discovered that IL17+ cells may promote tumor progression and nominated it as a therapeutic target to treat gastric cancer. The manuscript suffers from major limitations. The paper is in general descriptive and the number of samples studied is small. Importantly, the analysis is overall superficial, with limited insights adding to current knowledge.

Reply:
Thank you for pointing out our limitations. We have made substantial changes in the revised manuscript to address the reviewer's concerns. Specifically, we added bulk RNAseq and whole-exome sequencing (WES) of the same tissues to facilitate the analysis of epithelial cells and the identification of tumor cell. We also analyzed the TCGA-STAD and Genotype-Tissue Expression (GTEx) datasets to ensure our results can be observed in large cohorts. Moreover, we made effort to add more analysis and provide biological interpretations of the computational results. Furthermore, we performed experimental validations using new clinical samples to support our findings. Also, no information is provided about the stages and sites of the tumors, or the MSI status.
The cohort size is very small given the high degree of heterogeneity (genomic, epigenomics, transcriptomic, histopathological, etc.) However, the analysis results in this 11 study were not correlated with any of these clinical and histopathological variables mentioned above.

Reply:
We apologize for the limited information shown in Fig. S1 and for the misleading names  Mean expres si on in group

Reply:
To make a more convincing definition of tumor cells, we combined multiple indicators from different aspects. Firstly, we compared the CNVs identified using WES data with the inferred CNVs from scRNA-seq (Fig. R4a). The results were consistent at the chromosome-level (Fig. R4b, Supplementary Fig. 2a and Supplementary Fig. 16 R4f). IM_enterocyte and IM_goblet were considered as precancerous clusters as they showed neither high inferCNV scores nor mutation enrichment of tumor specific mutations.
The tissue enrichment level of each cluster of cells also agreed with this definition (Fig.   R4g).  (g) Tissue preference of each epithelial cluster estimated by Ro/e score.

15
(Please see the previous page for the legend) 16

The analysis performed on tumor cells is very superficial. The authors simply defined the tumor cells clusters but without performed any further profiling on tumor cells. The
insights derived from this part is very limited. Also, a clear limit is that, although ten tumors were sequenced, the epithelial and tumor cells were mainly derived from 4 tumors: GC08, GC10, GC07, GC02.

Reply:
We thank the reviewer for point out the limitations of our analysis. We completely rewrote this part with an emphasis on the biological functions and clinical relevance of the findings.
Given the limited number of single-cell samples, we combined bulk sequencing data and

Reply:
Thank you for pointing out the potential confounding effect. We found that highly expressed genes were hardly cluster-specific (i.e., a gene is expressed by a single cluster in an exclusive fashion). To overcome this limitation, we applied MuSiC (Wang, X. et al., 2019), an algorithm to implement bulk tissue cell type deconvolution with scRNA-seq data. This algorithm estimated proportions of different cell types in bulk RNA-seq data by a weighted non-negative least squares regression. After cell type deconvolution of bulk RNA-seq data from TCGA-STAD using MuSiC, we performed survival analysis with patients stratified according to the proportions of a certain cell type of interest. Our results showed that high proportions of Fib_1 and SMC_1 cells were significantly associated with worse patient survival. As for Macrophage_APOE and Endo_1, though both of them resulted in hazard ratios greater than 1, there was no statistical significance (Fig. R5). proportion of the respective cell type.

5.
The TFs analysis in this study was very descriptive, the results were kind of isolated, and the significance of such analysis is not evident.

Reply:
We added experiment result showing the biological consequences of overexpressing the TFs we identified as important regulators of tumor associated macrophages. The experiment demonstrated that APOE and APOC1 were upregulated in three different conditions after overexpressing NR1H3 or TFEC in THP-1-derived macrophages (Fig. R6).
As for stromal cells and tumor cells, we added discussion on the predicted TFs about the potential biological processes they involved in (angiogenesis and intestinal metaplasia, respectively). In the case of T cells, the TFs analysis was accompanied by pseudotime and trajectory analysis, providing insight into the potential regulatory TFs important for driving the cell differentiation trajectories. Besides, a list of predicted TF regulons that were enriched for specific clusters provided clues for further investigations.

The author stated that the DC_LAMP3 is less-described cluster, which is incorrect.
This cell population has been well described by Dr. Zemin Zhang's group (Zhang et al. Cell 2019;Cheng et al. Cell 2021) and other groups. Also, the differentiation origin of LAMP3+ DC from cDC2 has been well described by Cheng et al. (Cell, 2021, PMID: 33545035), and therefore is not novel.

Reply:
We agree with the reviewer and apologize for the misleading statements. We have now rephrased our statement in the revised manuscript (Page 13, lines 253-256). (Fig. 4a-

Reply:
Thank you for the suggestion. We down-sampled the T cells to 75%, 50%, 25% and 10% P h a g e-C tr l P h a g e-N R 1 H 3 P h a g e-T F E C P h a g e-C tr l of the cell population, separately. PCA was reproduced for these four subsets separately and then UMAP and four clustering approaches (Leiden, Louvain, k-means, and hierarchical clustering) were applied using the first 30 PCs (Fig. R7). The results showed that the clusters of the subsampled cells were close to the initial clustering result. .
To investigate the necessity of distinguishing some rare clusters, we looked for specifically expressed genes with clear biological significance. CD4_C1 and CD8_C1 were both naïve T cells expressing LEF1, TCF7, CCR7, and SELL (Fig. R7e), while they belonged to CD4 + T cells and CD8+ T cells respectively, which was an important difference.
CD8_C6 and CD8_C7 were quite rare clusters, while the former showed distinctive expression of ZNF683 (Tissue-resident T-cell transcription regulator protein) which indicated it was tissue-resident T cells. Besides, GNLY (Granulysin), an antimicrobial peptide, was also highly expressed by CD8_C6 but not by CD8_C7. As for CD8_C7, CD160 and KLRC1 were expressed at a much higher level. In the case of Treg_C1 and Treg_C3, the former expressed much higher levels of LEF1, CCR7, and SELL, indicating a naïve state, while the latter expressed higher levels of exhausted markers like LAYN.
The list of cluster-specific genes can also be found in Fig. 5b. In conclusion, the rare clusters had distinctively expressed genes with evidence of biological significance. Thus, it is necessary to distinguish these clusters instead of mixing them together.   Fig. 4d is not convincing as only a several of such cells are shown in the images.

Reply:
We apologize for only showing the IHC of a few cells in the previous version of the manuscript and we replaced it with better IHC results (Fig. R8a). As for the gene expression, we noticed that some cytokines were expressed in a low but extremely specific fashion.
Low expressions resulted in more dropout events, thus it is unsuitable to display such cytokines together with other highly expressed genes in the same dotplot. Therefore, we showed the expression of IL17A, IL17F, IL22, IL26, and IL23R in a separated dotplot with a smaller maximum fraction value (Fig. R8b).

Cell-cell communication analysis is potential interesting, but this part is purely descriptive without any functional evaluation. At least, correlation analysis should be performed to check whether the frequencies of those interacting cells are significantly corelated. The authors pooled cells from all samples and patients for Cell-cell communication analysis without take into consideration that these "interacting" cells (shown on the heatmap) may not even present in the same samples. For the methodology,
it is unclear how the data are filtered, and whether those rare cell subsets with only few cells were excluded and the cutoff applied.

Reply:
We apologize that we are not able to link the VDJ-gene usage to the pathogenesis of gastric cancer, given the limited number of patients. The original purpose of the VDJ-gene usage analysis was to indicate that T cells with different functions were likely to have different biases of VDJ-gene usages, instead of random usages. The underlying mechanism of this phenomenon is unclear and deserves further study.

Tc17 cells have to be functionally defined before further analyzing their developmental trajectories. What are the marker genes of these cells and what are their roles in anticancer immune response? T cell exhaustion trajectories have been well described in many
other studies, and therefore it is unclear how results from current study can add to existing knowledge.

Reply:
We reorganized the paragraphs in the revised manuscript. Reported studies of Tc17 and our hypotheses for the functional role of Tc17 in gastric cancer were illustrated before the analysis of developmental trajectories. Marker genes of Tc17 were highly similar to that of Th17, but Tc17 also expressed CD8 + T-cells-specific genes (Fig. R10a). Our results indicated that Tc17 cells may promote tumor progression through IL17, IL22, and IL26 signaling, however understanding the exact role of Tc17 in the TME requires further experiments. Clonotype and trajectory analysis suggested that Tc17 cells originated from tissue-resident memory T cells and could subsequently differentiate into exhausted T cells, providing an alternative pathway for T cell exhaustion distinct from the classic T cell exhaustion trajectories. The underlying mechanisms and functional significance between the two exhaustion trajectories warrant further investigation. 26 Figure R10. Marker genes of Tc17 were highly similar to that of Th17, but Tc17 also expressed CD8 + Tcells-specific genes.

There is no description of doublets removal in the Methods, it is therefore not sure that
doublets were carefully removed.

Reply:
The 12 main cell types in Fig. 1b were isolated and re-clustered separately. Subclusters expressing contradictory markers of known different cell types were removed as potential doublets. As suggested by reviewer 3, we listed the conventional markers of each cell type we used to identify the doublets in Table R2, accompanied by the number of excluded cells.

Reply:
We added a systematic evaluation of batch effects in Supplementary Note 1. Firstly, to evaluate the patient-specific features that might affect clustering, we investigated distributions of cell percentages contributed by different patients in each cluster (Fig. R11).
Most of the clusters had diverse contributors roughly proportional to sample sizes (GC10 had the largest sample size), except for abnormal epithelial cells. Tumor cells were highly heterogeneous thus tumor cells from different patients formed separated clusters; Cells with transcriptional-level intestinal metaplasia (IM) were only detected in several patients, mainly GC08 and GC09, which was also confirmed by bulk RNA-seq (Fig. S2f). Cells in epithelial clusters expressed highly specific marker genes (Fig. R4e, Table R1) thus the clustering was not dominated by unwanted batch effects. Besides, clusters consisting largely of cells from blood (e.g., Naïve T cells and Mono_CD14) had fewer contributors as the blood samples only came from GC06, GC07, GC08, GC09, and GC10. As for erythrocytes, very few of them passed through the ACK lysis buffer treatment.
We noticed that GC02 contributed 59% of the cells in CD8_C8_IL17A. We then plot the UMAP without GC02 (Fig. R11h) and found the shape of the cluster did not dramatically change. Besides, we counted the number of cells and the number of clonotypes contributed by each patient in the cell subset of trajectory analysis (Fig. R11i, R11j) (This subset consisted of cells that belonged to clonotypes detected in at least two clusters among CD8_C5_TOB1, CD8_C8_IL17A, and CD8_C9_HAVCR2). These demonstrated that result was not dominated by GC02 or other patient-specific features.
To evaluate the batch effect caused by cell quality, we first investigated the distributions of percentages of mitochondrial gene counts (abbreviated as "percent_mito" hereafter). The percent_mito raised when the cellular membrane was leaky or disrupted and cytoplasmic RNAs were released, thus it was considered as an indicator for cell quality.
The percent_mito of samples from different patients were similar (Fig. R12h). We found significantly higher percent_mito in epithelial cells. This phenomenon was found in most of the samples hence it was not accidental, though the reason had not been figured out. A hypothesis was that epithelial cells from stomachs were more vulnerable to the experimental procedures than other cell types. The distributions of percent_mito varied in epithelial subclusters, but this didn't seriously affect the clustering since each cluster expressed highly specific marker genes (Fig. R4e, Table R1). As for plasma cells (named Bcell_C7_SDC1 in B cells clustering), they had lower percent_mito due to the large number of total UMI counts, which was caused by a remarkable high expression of immunoglobulin-related genes. NKT_CD69 and Fib_3 showed higher percent_mito in NK cells and stromal cells, which might be affected by the cell-quality effect. Nevertheless, they would not affect the main results as our analyses did not focus on such cells.
The number of detected genes in a cell is another indicator of cell quality. A high 29 percent_mito combined with a low number of detected genes often indicates bad quality.
However, biological differences can also influence the number of expressed genes. For example, the number of expressed genes is a robust indicator of developmental potential (Gulati et al., 2020). Besides, we found cycling cells had higher numbers of genes (Fig.   R12i, R12l), which might be caused by the expression of cell-cycle genes. In addition, tumor cells seemed to express more genes than other cells (Fig. R12p).
In conclusion, the clustering was not obviously affected by patient-specific features.
Though Fib_3 and NKT_CD69 seemed to be associated with low cell quality as they both had high percent_mito and low numbers of genes, we did not focus on them individually.
Other clusters with unusual percent_mito or numbers of genes (e.g., tumor cells and cycling T cells) expressed highly specific marker genes. Therefore, we believed our main results were not significantly affected by batch effects.

Reply:
Our understanding is that this problem is similar to that in comment 4. As described in the reply to comment 4, to control the potential cofounding factors from other cell types, we applied MuSiC to bulk RNA-seq data from TCGA-STAD to make the survival analysis by stratifying the patients with inferred cell type proportions.

Reply:
We apologize for the inconvenience. These figures are now replaced with new figures as we have rewritten this part completely.
16. Line 167: the author stated that Fib_1 expressed IL11 and IL24 while the expression levels of these two genes were very low and not easily visible in Fig. 2C.

Reply:
We noticed that it was not suitable to plot low expression genes and high expression genes together in the same dot plot. Thus, we plotted them separately with a smaller maximum fraction value (Fig. R13), and added this figure in Supplementary Fig. 3g.
Stand ardi zed exp ressio n leve ls 17. Figure 3b, the color key for standardized expression levels was not generated properly.

Reply:
We apologize for this mistake. We have updated this figure in Fig. 4b. 18. Line 206: no references added for "other studies".

Reply:
We apologize for this mistake. We have added the references in the revised manuscript (Page 12, lines 226-233).
19. Figure 3c, the pathway name on the right bottom was not displayed properly.

Reply:
We apologize for this mistake. We have updated it in Fig. 4c.

Reply:
We apologize for this confusion. They were technical duplicates. The libraries were made using the same tissue. We added a detailed statement of the scRNA library preparation in the Methods section (Page 29, lines 599-601), and updated the Supplementary Fig. 1c.

Sun et al. have performed a detailed analysis of cells infiltrating the gastric cancer (GC)
tumor microenvironment, creating an atlas of >166,000 cells from 10 GC patients. Such cells were studied by scRNAseq, coupled with TCR sequencing. Among other findings, they report that tumor-associated macrophages and dendritic cells expressing LAMP3 were able to mediate T cell activity. Furthermore, they describe the role and function of IL-17 producing CD8+ T cells, a possible novel therapeutic target.
The paper is interesting and technically well done; the statistical analysis and the presentation of data are adequate.

Reply:
We really appreciate the reviewers' positive comments.

Comments:
1) Lines 603-605: "Cells expressing contradictory markers of known different cell types were removed as potential doublets". It would be better to indicate the markers used to identify the doublets and the number of excluded cells.

Reply:
Thank you for your suggestions. We have listed conventional markers of each cell type used to identify the doublets and the number of excluded cells in Supplementary Data File 2.xlsx. The contents are also present in Table R3 as follow. In the case of T cells, we performed more stringent filtering: T cells with greater than 2 TRB or 2 TRA sequences were regarded as doublets. This resulted in a large number of excluded T cells.

Reply:
Thank you for your suggestions. We added a systematic evaluation of batch effects in Supplementary Note 1 accompanied by Fig. S13 and Fig. S14. Besides, an evaluation of the robustness and necessity of T-cell clusters was added in Supplementary Note 2 accompanied by Fig. S15. In the case of T cells, the mitochondrial gene percentages and the number of genes of T cells are shown in Fig. R15 as follow. The distribution of mitochondrial gene percentages was balanced. Cycling T cells showed a high number of detected genes, which was likely caused by the expression of cell-cycle genes like MKI67.
For the example of Naïve CD8 (CD8-C1) and naïve CD4 (CD4-C1), they expressed the specific marker, CD8 or CD4, in a mutually exclusive fashion, as we know that CD8 + T cells and CD4 + T cells are biologically different cell types.

Reply:
Thank you for your suggestions. We found CREB1 and S100A4 were widely expressed by T cells (Fig. R16a). After meticulously reading this article (Szabo et al., 2019), we found ITGA1 was also a marker gene for TRM in addition to CXCR6 and ANXA1. We have added CREB1, S100A4, CXCR6, ANXA1 and ITGA1 into   purpose.

Reply:
Thank you for your suggestion. We have calculated the "Morisita similarity index" and found the new result (Fig. R17a) is consistent in trend with our previous result (Fig. R17b), which does not affect our conclusion. We have changed the criteria of assessing the clonal migration between blood and tissues in the Methods section (Page 38-39, lines 810-817) and replaced the figure (Fig. 6c) with the new one.

Reply:
Thank you for your suggestions. The patient-effect of the clustering was also evaluated in newly added Supplementary Note 1. In brief, we found GC02 contributed more than 50% of Tc17 cells. Therefore, we plotted the UMAP without GC02 (Fig. R18a) and found the shape of the cluster did not dramatically change. Besides, we counted the number of clonotypes and the number of cells contributed by each patient in the cell subset of Tc17 trajectory analysis ( Fig. R18b and R18c) (This subset was used in Figure 6 for the velocity 40 analysis, which consisted of cells that belonged to clonotypes detected in at least two clusters among CD8_C5_TOB1, CD8_C8_IL17A, and CD8_C9_HAVCR2). In this subset, GC02 did not account for a large proportion. These demonstrated that the results were not dominated by GC02 or other patients alone.