Non-cell-autonomous cancer progression from chromosomal instability

Chromosomal instability (CIN) is a driver of cancer metastasis1–4, yet the extent to which this effect depends on the immune system remains unknown. Using ContactTracing—a newly developed, validated and benchmarked tool to infer the nature and conditional dependence of cell–cell interactions from single-cell transcriptomic data—we show that CIN-induced chronic activation of the cGAS–STING pathway promotes downstream signal re-wiring in cancer cells, leading to a pro-metastatic tumour microenvironment. This re-wiring is manifested by type I interferon tachyphylaxis selectively downstream of STING and a corresponding increase in cancer cell-derived endoplasmic reticulum (ER) stress response. Reversal of CIN, depletion of cancer cell STING or inhibition of ER stress response signalling abrogates CIN-dependent effects on the tumour microenvironment and suppresses metastasis in immune competent, but not severely immune compromised, settings. Treatment with STING inhibitors reduces CIN-driven metastasis in melanoma, breast and colorectal cancers in a manner dependent on tumour cell-intrinsic STING. Finally, we show that CIN and pervasive cGAS activation in micronuclei are associated with ER stress signalling, immune suppression and metastasis in human triple-negative breast cancer, highlighting a viable strategy to identify and therapeutically intervene in tumours spurred by CIN-induced inflammation.


Contents
Page Number

Table of Contents
Supplementary Note 1 | Single cell RNA-seq data pre-processing.
Pre-processing, cell selection, and filtering of individual scRNA-seq samples.CellRanger (v3.1.0)was utilized to construct a count matrix from raw reads, including sample demultiplexing, alignment to CellRanger's mm10-3.0.0 reference, barcode processing, and the generation of a raw digital expression matrix by collapsing groups of reads with the same unique molecular identifier (UMI), cell barcode and gene annotation.The count matrix was then loaded into python using scanpy 1 (read_10x_h5) for subsequent preprocessing and downstream analysis.Redundant gene instances (n = 35) were collapsed by summing their cumulative molecule counts.
We then followed several filtering steps within each sample to ensure high data quality.Viable cells were distinguished from droplets consisting of ambient mRNA transcripts arising in solution due to premature lysis or cell death based on library size.As illustrated in Supplementary Fig. 4a, cells were ranked by library size (total molecule counts) in descending order.We then computed the first and second derivative of the normalized sum of this array (based on average of a 10-cell rolling window) and identified the inflection point, or first instance in which the second derivative is zero.All cells with a library size less than 0.9X the inflection point were discarded.

< 𝜇 $
!"#$% &'(")*(" % − 2.5 ×  $ !"#$% &'(")*(" %, or cells with low complexity libraries (in which detected molecules align to a small subset of genes determined by at least 0.4X standard deviations from a linear fit) (Supplementary Fig. 4c) were discarded.Within individual samples, scrublet 2 was used to predict cell doublet scores based on simulated multiplets from the dataset using a prior expected double rate of 0.06 (based on targeted cell loading concentration).The filtered count matrix was normalized for library size per cell, whereby the expression level of each gene was divided by the cell's total library size and then scaled by a constant of 10,000.Finally, the normalized count matrix was logtransformed according to: log() =  +, ( + 0.1) −  +, (0.1), to generate the log-transformed count matrix.
Merged sample library generation.Following single sample pre-processing and filtering, all biological samples (n = 13 murine tumors) were merged to yield a total of 41,376 high-quality cells with a median library size of 2,439 transcripts per cell.Cell size, viability and complexity metrics were of high quality and largely consistent across samples (Supplementary Fig. 4d).Genes detected in fewer than 10 cells (n = 14,413) were excluded; highly variable genes (HVGs) were then computed per sample using the CellRanger flavor of scanpy 1 .Cell doublet scores computed in individual samples were subsequently assessed at the cluster and single cell level for the merged library.Towards the former, Principal Component Analysis (PCA) was applied to the log-transformed count matrix.
Phenograph clustering 3 was computed on the top 50 principal components (k-nearest neighbors = 20) and the average doublet score was characterized per cluster.Three Phenograph clusters distinguished by high average double score and individual cells with doublet scores greater than a threshold (point where the first derivative of the doublet score distribution, smoothed using the Savitzky-Golay filter from SciPy 4 , plateaus to zero) were removed.Altogether, this resulted in the removal of 474 putative doublet cells from the merged cell atlas.
Evaluating sample-specific batch effects.Library batch effects were ruled out by quantifying local sample mixing in non-tumor CD45+ cells before and after applying a fast mutual nearest neighbor (MNN) batch-correction 5 .Cell states were highly reproducible within each condition, showing similar levels of sample mixing within the non-tumor, CD45+ fraction before and after applying a fast mutual nearest neighbor (MNN) batch-correction (Supplementary Fig. 4e).Some sample-specific biological variance is expected across tumor cell states 6,7 .Sample mixing within each condition (CIN high , CIN low , CIN high +Sting1 KD ) was computed for a given cell, within a local neighborhood (30-nearest neighbors) according to the Shannon entropy: where N is the number of biological samples and  -,/ is the fraction of the nearest neighbors of cell  that belong to sample  in the condition.

Supplementary
Supplementary Figure 2. Single cell library quality and pre-processing.
Supplementary Note 2 | Single cell RNA-seq macro cell type annotations.
CellAssign 9 was used to assign individual cells to all major cell types expected in murine TNBC models, without batch correction, based on a set of cell type-specific marker genes applied to the normalized count matrix.This cell type-marker gene list (annotated in Supplementary Table 6) was assembled using genes reported in the mouse cell atlas (MCA: http://bis.zju.edu.cn/MCA/), for all expected cell types in the TME, initially filtering for highly expressed and cell type specific genes using the Savitzky-Golay-smoothed first derivative filtering method described above (Merged sample library generation) for the pct_ratio (pct_1/pct_2) and avg_logFC.
We then took the top 20 highest avg_logFC ranked genes and removed redundant genes widely expressed across multiple clusters and supplemented canonical markers curated from the literature, including keratins (K8, K18 and K19) to distinguish tumor-derived mammary epithelial cells 10 .CellAssign probabilistically assigns each cell, independent of clustering, to each of the 14 cell types in the gene marker file.Cells with probability assignments less than 95% were labeled "unassigned".We then evaluated the quality of these single cell assignments at the cluster level.Louvain clustering 1,3 computed on the top 50 principal components yielded 23 clusters and distinguished one small cluster (n = 1,174 cells) with a high fraction of unassigned cells (63%) and characteristically low average library size (< 1,000 molecules/cell); the unassigned cells (n = 745) from this cluster were removed as low-quality cells with an unclear phenotype.Otherwise, randomly dispersed unassigned cells were assigned to the mode cell assignment (excluding unassigned cells) of their Louvain cluster.More refined cell subtype annotations (Fig. 2a) were made on the full data matrix partitioned by cell type, to better characterize within-cell type biological variance (see Supplementary Note 3, Within-cell type analyses).The following phenotypically similar cell subsets were combined prior to within-cell type analyses: (1) macrophages and monocytes, ( 2 Additionally, a contaminating subset of osteoclasts (n = 150 cells) were removed from downstream analyses.Final cell type assignments (Fig. 2a) were generated through one-to-one mapping of cell subtypes to the macro cell type for the global atlas.This yielded a filtered and annotated cell atlas comprised of 39,234 cells expressing 16,604 genes across 11 macro cell types and 28 cell subtypes (Fig. 2a) used for all downstream analyses.

Supplementary Note 3 | Single cell RNA-seq within-cell type analyses.
Phenotypes and state transitions were further characterized within each major cell type using Louvain clustering 1,3 (described here) or Markov state modeling of single cell trajectories using CellRank (see Supplementary Note 4, Single cell trajectories using CellRank) 11 .
Within each major cell type, genes not expressed by any cells in the subset were removed and HVGs were computed per sample and merged across samples using the CellRanger flavor of scanpy 1 , except for B cells that had HVGs computed across all samples together due to their small number (n < 10 cells) in some individual samples.Principal Component Analysis 12 was applied to the log-transformed count matrix, where the number of principal components were selected based on the knee point of the cumulative explained variance 13 .Diffusion components (DCs) were then computed on these top principal components, to better characterize major axes of variation within each cell type 14 .As implemented in Palantir 15 , a cell-cell Euclidean distance matrix was computed based on the principal components of the log-transformed count matrix.An adaptive Gaussian kernel was then applied to convert distances into affinities, so that similarities between two cells decreases exponentially with distance.Finally, the affinity matrix was row-normalized to construct a Markov transition matrix, whose eigenvectors are termed diffusion components.The optimal number of diffusion components required to explain variance within each cell type was determined using the eigengap method 15 .MAGIC imputation 16 was applied to selected diffusion components to further denoise and recover missing gene values using default parameters (t = 3, k = 15, implemented by Palantir), yielding a cell type-specific imputed count matrix for all genes.Louvain clustering was performed on principal components at twenty different cluster resolutions (r=0.05,0.1, …, 1.0).The Adjusted Rand Index (ARI) was calculated for cluster assignments made across all pairs of r, creating a 20x20 matrix of ARI values that was used to select a value of r for which categorical assignments were stable.This was visualized as a two-dimensional heatmap (illustrated for macrophages in Supplementary Fig. 4f).Final cell subtype annotations were made at the level of Louvain clusters with the optimal cluster resolution r.The only exception to this approach was in the characterization of T cell subsets, which were alternatively annotated using the ProjecTILs 17 reference method (see below).The MAST 18 statistical framework was used to characterize differentially expressed genes (DEG) per cluster, compared to all other cells in the subset.MAST was applied to the log-transformed data and the Bonferroni correction was applied to correct for multiple hypothesis testing ( #$/ ).Gene set enrichment analysis (GSEA) was then performed for all DEG ranked and scored by: using cell type-specific pathway annotations (listed in Supplementary Table 7) to define associated biological processes.Specific details and exceptions to this generalized framework are outlined for all major cell types below.
Macrophages.Within the macrophage subset, 803 genes were not expressed in any cells and removed.This yielded a macrophage subset composed of 9,800 cells expressing 15,801 unique genes, of which 3,870 were highly variable across the cell type.The top 20 principal components and the top 2 diffusion components were used for downstream analyses.Louvain clustering was performed on selected principal components with a resolution of r=0.15, yielding three distinct macrophage subtypes, including two M1-like states and one M2-like state based on differentially expressed genes and pathways (Extended Data Fig. 2d).
Neutrophils/GR-MDSCs. Within the granulocytic subset, 2,428 genes were not expressed in any cells and removed.This yielded a PMN/GR-MDSC subset composed of 12,593 cells expressing 14,176 unique genes, of which 3,219 were highly variable.The top 16 principal components and 3 diffusion components were used for downstream analyses.Louvain clustering was performed on selected principal components at a resolution of r=0.35, yielding four distinct subtypes, including one ISG-Neutrophil and three granulocytic (Gr) MDSC subsets based on differentially expressed genes and pathways (Fig. 2a).When evaluating conditionally-dependent differential abundance of granulocytic subpopulations, the phenotypically similar MDSC subsets were merged (Fig. 2b).

Dendritic cells. Dendritic cells were composed of two transcriptionally distinct subpopulations of conventional dendritic cells (cDCs)
and plasmacytoid dendritic cells (pDCs); which were treated separately following the generalized framework outlined above.Within the cDC subset, 2,829 genes were not expressed in any cells and removed, yielding a cDC subset of 762 cells expressing 13,775 unique genes, of which 3,938 were highly variable.Louvain clustering was performed on the top 13 principal components at a resolution of r=0.15, yielding three distinct cDC subclusters.These were merged with the pDC subset (313 cells) and HVGs were re-computed on the complete dendritic cell subset.Altogether, this yielded 1,075 dendritic cells expressing 14,227 unique genes, of which 3,652 were highly variable.The three cDC subtypes corresponded to the well characterized cDC1, cDC2 and activated (DC3, mregDC) dendritic cell states based on differential expression of canonical genes and pathways (Extended Data Fig. 2h-i).
T-cells.Rather than annotating T-cell phenotypic states based on Louvain clusters, which was challenging given their continuous nature and complex phenotypes, we projected the normalized count matrix of the T cell subset onto a comprehensive T cell reference atlas using the ProjecTILs 17 workflow (Supplementary Fig. 5).Pre-exhausted (CD8_Tpex) and terminally exhausted T cells (CD8_Tex) predicted by ProjecTILs were combined and annotated as "CD8+ Dysfunctional T cells", and CD4+ follicular helper (Tfh) and the CD4+ Th1-like cells (Th1) predicted by ProjecTIL were combined and annotated as "CD4+ Helper T cells."Trajectory inference was performed within the CD8+ T-cell subset (see Supplementary Note 4, Single cell trajectories using CellRank).This consisted of 4,797 CD8+ T-cells expressing 14,517 unique genes, of which 3,431 were highly variable.The top 14 principal components and 3 diffusion components computed within the CD8+ T-cell subset were used for downstream analyses.

Figure 4 .
Single cell pre-processing.a-c, Quality control plots for single cell RNA sequence data.Cells were filtered based on (a) cumulative number of transcript counts, (b) fraction of mitochondrial mRNA detected per cell and (c) cell complexity (Preprocessing, cell selection and filtering of individual scRNA-seq samples); shown here for one representative library.Excluded cells are labeled in red.Color on scatter plots represents cell density.d, Kernel Density Estimate (KDE) plots showing distribution across cells of total number of transcripts (top) and mitochondrial fraction (bottom) for each sample.e, Violin plots showing entropy of sample distribution within each condition, computed in a local neighborhood (nearest neighbor k = 30) around each cell in CD45+ (immune) and CD45-(tumor and stroma) cell subsets before and after applying fastMNN batch correction 5 .High entropy indicates most similar cells come from a well-mixed set of samples within the condition, whereas low entropy indicates most similar cells come from the same sample.f, Adjusted rand index (ARI) was used to assess robustness of Louvain cluster assignments 3,8 to cluster resolution r by evaluating co-occurrence of cells in clusters assigned across all pairwise values of r.For each pairwise comparison, we compute the ARI across all cells; visualized on a heatmap.Here we show optimization of Louvain cluster resolution for the macrophage subset (r = 0.15) because this was the first instance of r for which cell subtype categorical assignments were stable.
) NK/T-cells with T-cells and NK-cells, and (3) Neutrophils and Gr-MDSCs.A filtering step was applied when evaluating clusters within each macro cell type to ensure high data quality.Specifically, three within-cell type Louvain clusters (comprising a total of 773 cells) exhibited features of apoptotic cells with low library size were removed and an additional 146 cells were clearly misassigned (clustering distinctly away from initial cell type annotation with other macro-cell types) and reassigned according to mode nearest neighbor annotations.

Figure 5 .
Annotating T cells.a, UMAP projection of the complete ProjecTILs T cell reference atlas 17 colored by cell type with mapped T cell data from this study (n = 7,949 cells) overlaid as individual points and contours.b, Heatmap of canonical markers grouped by T cell subtype; genes were clustered by average Euclidean distance.Imputed expression was standardized by znormalization for each gene across all T cells.B-cells.Within the B cell subset, 3,868 genes were not expressed in any cells and removed.This yielded 2,337 B-cells expressing 12,736 unique genes, of which 3,251 were highly variable.Louvain clustering was performed on the top 11 principal components at a resolution of r = 0.25, which revealed one major IFN-responsive subpopulation and one bona fide plasma cell subpopulation, as well as three additional B cell states, based on differentially expressed genes and pathways (Extended Data Fig.3f).Tumor cells.Within the tumor subset, 801 genes were not expressed in any cells and removed.This yielded 3,596 tumor cells expressing 15,803 unique genes, of which 5,794 were highly variable across the subset.Louvain clustering was performed on the top 16 principal components within the tumor subset at a resolution of r=0.25, yielding five distinct tumor cell states -two mesenchymal stem-like states, one basal-like state, and two luminal-like subtypes, based on differentially expressed genes and pathways (Extended Data Fig.7b-e).Diffusion components were likewise computed on the top 16 principal components.The top 3 diffusion components were selected for downstream analysis (see Supplementary Note 4, Single cell trajectories using CellRank).