Ovarian cancer mutational processes drive site-specific immune evasion

High-grade serous ovarian cancer (HGSOC) is an archetypal cancer of genomic instability1–4 patterned by distinct mutational processes5,6, tumour heterogeneity7–9 and intraperitoneal spread7,8,10. Immunotherapies have had limited efficacy in HGSOC11–13, highlighting an unmet need to assess how mutational processes and the anatomical sites of tumour foci determine the immunological states of the tumour microenvironment. Here we carried out an integrative analysis of whole-genome sequencing, single-cell RNA sequencing, digital histopathology and multiplexed immunofluorescence of 160 tumour sites from 42 treatment-naive patients with HGSOC. Homologous recombination-deficient HRD-Dup (BRCA1 mutant-like) and HRD-Del (BRCA2 mutant-like) tumours harboured inflammatory signalling and ongoing immunoediting, reflected in loss of HLA diversity and tumour infiltration with highly differentiated dysfunctional CD8+ T cells. By contrast, foldback-inversion-bearing tumours exhibited elevated immunosuppressive TGFβ signalling and immune exclusion, with predominantly naive/stem-like and memory T cells. Phenotypic state associations were specific to anatomical sites, highlighting compositional, topological and functional differences between adnexal tumours and distal peritoneal foci. Our findings implicate anatomical sites and mutational processes as determinants of evolutionary phenotypic divergence and immune resistance mechanisms in HGSOC. Our study provides a multi-omic cellular phenotype data substrate from which to develop and interpret future personalized immunotherapeutic approaches and early detection research.

High-grade serous ovarian cancer (HGSOC) is an archetypal cancer of genomic instability 1-4 patterned by distinct mutational processes 5,6 , tumour heterogeneity [7][8][9] and intraperitoneal spread 7,8,10 . Immunotherapies have had limited efficacy in HGSOC [11][12][13] , highlighting an unmet need to assess how mutational processes and the anatomical sites of tumour foci determine the immunological states of the tumour microenvironment. Here we carried out an integrative analysis of whole-genome sequencing, single-cell RNA sequencing, digital histopathology and multiplexed immunofluorescence of 160 tumour sites from 42 treatment-naive patients with HGSOC. Homologous recombination-deficient HRD-Dup (BRCA1 mutant-like) and HRD-Del (BRCA2 mutant-like) tumours harboured inflammatory signalling and ongoing immunoediting, reflected in loss of HLA diversity and tumour infiltration with highly differentiated dysfunctional CD8 + T cells. By contrast, foldback-inversionbearing tumours exhibited elevated immunosuppressive TGFβ signalling and immune exclusion, with predominantly naive/stem-like and memory T cells. Phenotypic state associations were specific to anatomical sites, highlighting compositional, topological and functional differences between adnexal tumours and distal peritoneal foci. Our findings implicate anatomical sites and mutational processes as determinants of evolutionary phenotypic divergence and immune resistance mechanisms in HGSOC. Our study provides a multi-omic cellular phenotype data substrate from which to develop and interpret future p er so na lized i mm unotherapeutic approaches and early detection research.
The principal defining features of high-grade serous ovarian cancer (HGSOC) are profound structural variations in the form of copy number alterations and genomic rearrangements, which accrue on a genetic background of nearly ubiquitous TP53 mutation 14 . Somatic and germline alterations in homologous recombination (HR) repair pathway genes such as BRCA1 and BRCA2 lead to HR deficiency (HRD) in approximately half of HGSOC cases 15 . Beyond gene alterations, patients stratify by endogenous mutational processes 3,16 as inferred from structural variation patterns in whole-genome sequencing (WGS), including HRD subtypes (BRCA1-associated tandem duplications, HRD-Dup; BRCA2-associated interstitial deletions, HRD-Del), CCNE1 amplification-associated foldback inversion (FBI)-bearing tumours and CDK12-associated tandem duplicator (TD)-bearing tumours 5,6 .
HGSOC presents a distinctive clinical challenge resulting from the widespread intraperitoneal disease at diagnosis. Long latency allows for broad periods of clonal diversification and tumour-immune interactions to unfold in the heterogeneous microenvironments of the peritoneal cavity 7,10,17 . This raises key questions about how underlying mutational processes and local tissue sites influence clonal selection, tumour microenvironments (TMEs) and immune recognition. We carried out a prospective study, capturing mutational processes from WGS, cell phenotypes from single-cell RNA sequencing (scRNA-seq) and spatial topology from in situ multiplexed cellular imaging in multi-site cases of HGSOC. Our findings identify distinct immunostimulatory and immunosuppressive mechanisms that co-segregate with sites of disease and mutational processes, thereby defining new determinants of immune recognition and escape in HGSOC.
Multi-site tissue biopsies (n = 160) were collected from newly diagnosed, treatment-naive patients (n = 42) undergoing laparoscopy or primary debulking surgeries over a 24-month period (Fig. 1a). Collection took place at anatomical sites including adnexa (that is, potential primary lesions), omentum, peritoneum, bowel, ascites and other intraperitoneal sites (Extended Data Fig. 1a). The clinical characteristics of all patients are summarized in Extended Data Fig. 1b Table 2), haematoxylin and eosin (H&E) staining and multiplexed immunofluorescence (mpIF) on fixed tissue sections, clinical tumour-normal sequencing of 468 cancer genes by MSK-IMPACT and whole-genome tumour-normal sequencing (Extended Data Fig. 1a,b and Methods). WGS copy number profiles were highly concordant with an external 'meta-cohort' derived from several HGSOC WGS studies 5 (Extended Data Fig. 2a).
Mutational signature inference from WGS data yielded 16 HRD-Dup, 6 HRD-Del and 14 FBI tumours (Extended Data Figs. 1b and 2b,c and Supplementary Tables 1 and 3), with model features consistent with previous reports 5 (Extended Data Fig. 2d,e), stable across multiple computational methods 18,19 and in agreement with BRCA1 and BRCA2 mutations and clinical HRD testing (Extended Data Fig. 2b). Furthermore, tumours with high-level amplifications in CCNE1 (Extended Data Fig. 2f) and MYC exhibited expected distributions of gene amplifications within signature assignments and cis-acting gene expression correlates (Extended Data Fig. 2g,h).

Site-specific TMEs
We constructed a cell type map from the scRNA-seq data, quantifying nine broad cellular lineages: epithelial cells, lymphoid cells (T and natural killer (NK) cells, B cells, plasma cells), myeloid cells (monocytes/macrophages, dendritic cells (DCs), mast cells) and stromal cells (fibroblasts, endothelial cells) (Fig. 1b, (1,257) Dendritic cell (4,814) Endothelial cell (18,531) B cell (18,673) Plasma cell (20,   f, Site-specific enrichment of cell type composition in scRNA-seq, H&E and mpIF data fitted using a GLM. GLMs for H&E and mpIF data were separated by tumour (T) and stroma (S) regions. The colour gradient indicates the log 2 -transformed odds ratio (red, enrichment; blue, depletion), and sizes indicate the Bonferroni-corrected -log 10 (P value). g, Cell type composition based on scRNA-seq data for CD45 − and CD45 + samples. Upper panels, absolute and relative cell type numbers; lower panels, box plot distributions of sample ranks with respect to tumour site. h, Cell type composition based on H&E with lymphocyte ranks in tumour and stroma. Panels are analogous to those in g. i, Cell type composition based on mpIF with CD8 + T cell ranks in tumour and stroma. Panels are analogous to those in g. For c and g-i, violin plots and box plots are shown as the median, top and bottom quartiles; whiskers correspond to 1.5× interquartile range (IQR). *P < 0.05, **P < 0.01.

Article
cell-specific somatic copy number alterations driving gene dosage effects. Immune cell composition varied across anatomical sites within patients (Fig. 1e,f). Whereas CD45 − fractions (ranging from fibroblast-rich to cancer cell-rich samples) were largely conserved between anatomical sites (Fig. 1g), CD45 + fractions (ranging from myeloid-rich to lymphoid-rich samples) were substantially different (Fig. 1f Fig. 3f).
Inter-site compositional variation within patients stimulated deeper analyses to assess immune cell phenotypic states. We identified 10 major T and NK cell clusters with 41 minor subclusters (Fig. 2a Fig. 5c). T and NK cell clusters followed a gradient across uniform manifold approximation and projection (UMAP) space (Fig. 2b), highlighting site-specific phenotypic differences that were quantified by fitting a generalized linear model (GLM) of cluster composition (Fig. 2c). In particular, naive/stem-like and central memory CD4 + T cells (cluster 1) were depleted in adnexal samples and enriched in ascites (Extended Data Fig. 7a,b and Supplementary Table 5). Conversely, dysfunctional CD4 + and CD8 + T cells (clusters 3-5 and 15-17) were depleted in ascites but enriched in adnexal and other tumour sites (Supplementary Table 5), in line with dysfunction driven by chronic antigen exposure in solid tumours. Clusters for regulatory T cells (7)(8)(9)(10) and regulatory NK cells (27)(28)(29)(30)(31)(32)(33) were also enriched in adnexal samples (Fig. 2c), potentially indicative of increased immunomodulatory feedback at these sites.
Comparisons of solid tumour sites within patients showed naive/ stem-like and central memory T cell enrichment in non-adnexal sites (22 of 31 patients) and dysfunctional T cell enrichment in adnexal sites (19 of 31 patients) (Extended Data Fig. 7a, vector plot). Shannon entropy analysis indicated higher within-site variation of T cell phenotypes in adnexal samples, suggesting the coexistence of differentiated states within the primary site relative to non-adnexal samples (Fig. 2d).  Intriguingly, analysis of site-to-site Bray-Curtis dissimilarity showed high compositional differences between solid tumours and ascites both within and between patients ( Fig. 2d and Extended Data Fig. 7c). Differentiation trajectories projected CD8 + T cells along two axes of (1) terminally dysfunctional and (2) interferon-stimulated gene (ISG) T cell states (Fig. 2e), defined by loss of naive T cell markers and acquisition of dysfunctional and cytotoxic traits (Fig. 2f). The trajectories were associated with loss of transcription factors expressed in naive and central memory T cells (TCF1 and LEF1) and acquisition of type I interferon (IFN) (ISG15), cytotoxic function (GZMK) and T cell dysfunction (TOX, CXCL13 and PDCD1) (Extended Data Fig. 7d). Notably, expression trajectories also differed across sites, with ascites exhibiting high cytotoxic module scores in contrast to the high dysfunctional T cell scores in adnexa and omentum (Extended Data Fig. 7e). Phenotypic state composition in myeloid and DC compartments also varied as a function of site (Extended Data Fig. 6a-c). DCs clustered     Table 5). Similarly to T cells, major compositional differences were noted between solid tumour foci and ascites both within and between patients (Extended Data Fig. 6d). Thus, phenotypic immune state differentiation for both lymphoid and myeloid cells was strongly linked to tumour site, underlying both within-and between-patient variation in TMEs and providing clear evidence that ascites immunophenotypic composition is unrepresentative of solid tumours.

Tumour cell phenotypic diversification
We next defined how mutational processes in cancer cells influenced cancer cell-intrinsic signalling and immune phenotypes. We identified ten epithelialclusters from CD45 − cells (  Table 5). All three immune signalling pathways in Cancer.cell.3 were substantially increased in the adnexal lesions of HRD-Dup cases compared    indicates the log 2 -transformed odds ratio (red, enrichment; blue, depletion), and sizes indicate the Bonferroni-corrected -log 10 (P value). c, Distributions of CD8 + T cell state module scores and JAK-STAT signalling pathway activity scores with respect to mutational signature (patient counts shown). d, Scaled module scores within the subset of CD8 + T cells with respect to pseudotime and mutational signature. e, Correlation of JAK-STAT signalling scores in CD8 + T cells in CD45 + samples with those in cancer cells in matched CD45 − samples. f, Left, intra-sample diversity of T and NK cell clusters in adnexal and non-adnexal samples estimated by Shannon entropy, with samples grouped by mutational signature (patient and sample counts shown). Right, intra-and inter-patient dissimilarity in T and NK cell cluster composition, with samples grouped by mutational signature, estimated using the Bray-Curtis distance. Pairwise dissimilarity is shown for all pairs of sites (patient and sample pair counts shown) excluding ascites (top) and for adnexal versus non-adnexal pairs of sites (bottom). g, Spatial density of CD8 + T cell phenotypes in adnexal and non-adnexal mpIF samples as a function of distance to the tumour-stroma interface, with samples grouped by mutational signature (Methods). In c and f, box plots and violin plots show the median, top and bottom quartiles; whiskers correspond to 1.5× IQR. Colours in f and g are analogous to those in c-e. *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001; brackets indicate two-sided Wilcoxon pairwise comparisons in c and f. with FBI cases ( Fig. 3e; P = 4.1 × 10 −3 , 5.2 × 10 −3 and 5.2 × 10 −3 ). This was not seen in non-adnexal lesions, implying that cell-intrinsic immune signalling in HRD-Dup cases originates in primary tumours. By contrast, TGFβ signalling was more prominent in non-adnexal sites of FBI cases ( Fig. 3e; P < 1 × 10 −4 ), linking FBI-specific activation of TGFβ signalling to the metastatic process. We note that within-patient differences in pathway activity were not linked to copy number clone identity (Extended Data Fig. 9a); for example, we see differences in JAK-STAT pathway activity independently of the copy-number profile in the same patient (Extended Data Fig. 9b,c).
Notably, cancer cell clusters differed by expression of major histocompatibility complex (MHC)-encoding genes ( Fig. 3f and Extended Data Fig. 8e-g). MHC class I genes (HLA-A, HLA-B, HLA-C and B2M) and MHC class II genes (HLA-DRA and HLA-DRB1) were highly expressed in Cancer.cell.3, with upregulation in HRD relative to FBI adnexal tumours (Fig. 3f), indicative of increased antigen presentation accompanied by upregulated expression of CD274 (PD-L1) (Extended Data Fig. 8h; P = 2.8 × 10 −3 ). While at the sample level Shannon entropy showed similar levels of cell-intrinsic diversification across the mutational signatures, FBI tumours exhibited statistically higher Bray-Curtis dissimilarity in adnexa versus non-adnexa sample pairs ( Fig. 3g and Extended Data Fig. 8i), potentially indicating that these cancer cells have a greater capacity for phenotypic diversity when migrating to distal sites.
In addition, stark compositional differences of naive and dysfunctional T cells were observed as a function of mutational signature  Table 5). This was similarly reflected in higher JAK-STAT signalling in HRD-Dup tumours ( Fig. 4c and Extended Data Fig. 10a) and along differentiation trajectories of T cell phenotypes (Fig. 4d). T cell-intrinsic and cancer cell-intrinsic JAK-STAT signalling was correlated in matched samples across the mutational subtypes (Fig. 4e). Higher phenotypic T cell state diversity was found in HRD-Dup tumours, accompanied by remarkably consistent intra-patient Bray-Curtis indices, suggesting that diversification processes were recurrent across patients (Fig. 4f).
Using mpIF, we tested whether heightened immune signalling in HRD tumours could be attributed to reciprocal interactions between cancer cells and immune cells in the tumour and stromal compartments in the TME ( Fig. 4g and Extended Data Fig. 10d). Activated CD8 + PD-1 + TOX − T cells were more prevalent in non-adnexal as compared with adnexal samples, with differences across compartments more pronounced in HRD subtypes than in FBI cases (Fig. 4g). Similarly, terminally dysfunctional CD8 + PD-1 + TOX + T cells were enriched within the peritumoural stroma in HRD-Dup cases and in the tumour of HRD-Del cases. By contrast, CD8 + PD-1 + TOX − and CD8 + PD-1 + TOX + T cells were less abundant in FBI cases and were evenly distributed within the tumour and stroma, implying reduced T cell-antigen interactions (Fig. 4g

Article
in HRD-Del cases and enrichment of M1 macrophages in FBI cases (Extended Data Fig. 11a,e). Phenotypic diversification of myeloid cells was elevated in HRD-Dup adnexal samples with high entropy (Extended Data Fig. 11b, left); however, inter-patient Bray-Curtis dissimilarity was statistically higher in FBI tumours, suggesting greater patient specificity in FBI relative to HRD-Dup cases (Extended Data Fig. 11b, right). Diversification was characterized by M2.CXCL10 macrophage enrichment in HRD-Dup and depletion in FBI (Supplementary Table 5), with FBI tumours also exhibiting fewer PD-L1 (CD274)-positive macrophages (Extended Data Fig. 11a,d). In line with CXCL10 being a target of JAK-STAT signalling, macrophages in HRD-Dup, but not FBI, samples presented higher JAK-STAT pathway activity (Extended Data Fig. 11c). Spatially, we observed elevated localization of CD68 + macrophages in the periphery of both HRD-Dup and FBI adnexal samples that extended into the tumour for HRD-Dup, but not FBI, samples (Extended Data Fig. 11g).
Concordance of JAK-STAT pathway activation among all cell subtypes implies a common upstream effector. We therefore examined type I IFN pathway regulators in DCs, which commonly serve as a key activator of JAK-STAT signalling. We observed a strong positive correlation between the IFN regulator module score in DCs and JAK-STAT pathway activation in cancer cells, T cells and macrophages (Extended Data Fig. 11h). Thus, increased type I IFN activation in DCs in HRD-Dup tumours may serve as an activator of JAK-STAT signalling, with downstream upregulation of human leukocyte antigen (HLA) molecules and PD-L1 in cancer cells and macrophages.

Mutational processes drive immunoediting
We next investigated whether increased immune signalling in HRD subtypes led to mechanisms mediating immune escape. We profiled loss of HLA presentation machinery 24 inferred through loss of heterozygosity (LOH) of chromosome arm 6p-harbouring HLA class I and class II genes-at the single-cell level using the SIGNALS algorithm 5 . Predictions were restricted to cancer cells (Fig. 5a), with per-cell B-allele fractions (BAFs) classed as balanced, imbalanced or LOH (Extended Data Fig. 12a,b) and orthogonal genomic validation from site-matched WGS and MSK-IMPACT datasets (Extended Data Fig. 12c-f). We observed marked inter-patient heterogeneity (Fig. 5b,c and Extended Data Fig. 12a,b), with clonal 6p LOH in 4 of 41 patients (10%) and subclonal 6p LOH in 7 of 41 patients (17%; Fig. 5c, left). Intriguingly, site-specific losses were found in 4 of 41 patients (Fig. 5c, right). Clonal 6p LOH was primarily observed in HRD-Dup cases, whereas subclonal distributions were more frequent in patients with FBI tumours (Fig. 5c). Higher prevalence of 6p LOH in HRD-Dup was validated in an independent cohort (n = 1,298 patients) with available MSK-IMPACT sequencing (31% in BRCA1-mutant cases, 19% in BRCA2-mutant cases and 24% in CCNE1-amplified cases Fig. 5d). Notably, clonal 6p LOH was present in adnexal lesions in 5 of 47 samples (Fig. 5e), in line with 'early' immune evolutionary selection in the primary site. Patient 022 with the HRD-Dup subtype and patient 065 with the FBI subtype further showed patient-specific evolutionary timing of 6p LOH (Fig. 5f). Functional consequences of 6p LOH in HRD-Dup were also observed, including upregulation of JAK-STAT signalling (Extended Data Fig. 12g,h), which was most pronounced in bowel samples (Extended Data Fig. 12i), and increased presence of dysfunctional CD4 + and CD8 + T cells (Fig. 5g). Together, association of LOH of HLA alleles with heightened JAK-STAT signalling and T cell dysfunction points to 'early' immune-mediated evolutionary selection of 6p loss in HRD-Dup tumours, in contrast to evolutionarily 'late' clonal expansion of 6p LOH in FBI tumours.

Spatial topology of the microenvironment
The single-cell analyses above link immunophenotypic variation to mutational signatures and tumour site. We sought to validate these findings with tumour-immune cell interactions and spatial topologies from in situ mpIF profiling of principal immune cell types (T cells and macrophages) and their functional markers (PD-1, TOX, PD-L1) (Extended Data Fig. 13a). We enumerated the proximal interactions of naive/memory (CD8 + PD-1 − TOX − ), activated/predysfunctional (CD8 + PD-1 + TOX − ) and dysfunctional (CD8 + PD-1 + TOX + ) T cells with PD-L1-expressing cancer cells (pan-cytokeratin (panCK) + PD-L1 + ).  Interactions between proximal PD-L1-expressing cancer cells and activated/predysfunctional T cells were particularly high in bowel samples, and dysfunctional T cell interactions were high in both bowel and adnexal samples (Extended Data Fig. 13b,c). Omentum samples, by contrast, exhibited relatively few proximal interactions of either T cell or macrophage phenotypes with PD-L1-expressing cancer cells (Extended Data Fig. 13d,e). Mutational signatures also impacted cellular interactions, as predicted by higher receptor-ligand co-expression of PD-L1 (CD274) in myeloid clusters and PD-1 (PDCD1) in T and NK cell clusters in HRD subtypes derived from scRNA-seq data (Extended Data Fig. 13f). In line with these findings, the spatial organization of cellular neighbourhoods also varied by mutational subtype, as reflected in nearest-neighbour distances between T cells and panCK + PD-L1 + cancer cells (interactions of exemplar samples are shown in Fig. 6a). Antigen-experienced CD8 + PD-1 + T cells within a 30-μm radius of PD-L1 + cancer cells were common in HRD-Dup cases but rare or absent in FBI tumours (Extended Data Fig. 14a). When combining site and signature, the shortest median distances were observed in HRD-Dup adnexa and bowel samples, particularly in the activated/predysfunctional and dysfunctional T cell compartments ( Fig. 6b and Extended Data Fig. 14b), supporting PD-L1 as a negative feedback mechanism in response to activated T cells in HRD tumours. Similar interactions were noted between T cells and CD68 + PD-L1 + macrophages, which were particularly prevalent in HRD-Del cases (Extended Data Fig. 14c-f), but largely absent in FBI tumours. Overall, mpIF analysis further highlighted site-and mutational signature-dependent TMEs consistent with scRNA-seq-based observations.

Discussion
Our results synthesize anatomical sites and mutational processes as determinants of HGSOC TMEs and their phenotypic states. We speculate that, while the relative paucity of immune cells in adnexal sites is driven by the immune privilege of the ovaries and fallopian tubes, the predominance of dysfunctional T cells at these sites reflects immunoreactivity early in cancer evolution with subsequent immune escape in metastatic sites. In addition, contrasting cell-to-cell topological features of omental and bowel samples indicates that specific metastatic sites may harbour tissue-specific immunosurveillance constraints. Moreover, high intra-and inter-TME heterogeneity highlights that mechanisms of immune resistance are not universal in a given patient, requiring any therapeutic approach to account for evolution of the immune response in individual tumours.
Moreover, how mutational processes engender distinct immune evasion mechanisms raises additional questions in view of preclinical studies suggesting that immunogenicity in HRD tumours may lead to improved responses to immune checkpoint blockade (ICB) [25][26][27][28] . Clinical evidence for this is lacking as no association between HRD status, tumour mutational burden and response to ICB alone or in combination with chemotherapy has been observed [11][12][13] . Our findings highlight that mechanisms of immune resistance are distinct among the mutational subtypes, including activated type I IFN signalling in T cells, cancer cells and myeloid cells that is particularly enriched in HRD-Dup tumours. These data argue for multifaceted strategies for immunotherapeutic reprogramming that consider the underlying mutational process, in particular for FBI tumours, which are more resistant to chemotherapy 5 and here are found to be immunologically inert.
Altogether, our study provides an extensive multi-modal resource, mapping the cellular constituents of HGSOC TMEs and linking them to mutational processes and spatial context. Our findings illustrate that even personalized approaches may be ineffective against widespread and heterogenous disease within patients, highlighting the urgent need for early detection before dissemination into the peritoneal cavity.
The data presented here can be leveraged broadly to contextualize mechanistic insights into immunotherapeutic response across cancers of genomic instability.

Online content
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41586-022-05496-1.

Experimental methods
Sample collection. All enrolled patients were consented to an institutional biospecimen banking protocol and MSK-IMPACT 29 testing, and all analyses were performed according to a biospecimen research protocol. All protocols were approved by the institutional review board (IRB) of Memorial Sloan Kettering Cancer Center. Patients were consented following the IRB-approved standard operating procedures for informed consent. Written informed consent was obtained from all patients before conducting any study-related procedures. The study was conducted in accordance with the Declaration of Helsinki and the Good Clinical Practice guidelines.
We collected fresh tumour tissues from 42 patients with HGSOC at the time of upfront diagnostic laparoscopic or debulking surgery. Ascites and tumour tissue from primary and multiple metastatic sites, including bilateral adnexa, omentum, pelvic peritoneum, bilateral upper quadrants and bowel, were procured in a predetermined, systematic fashion (median of four primary and metastatic tissues per patient) and were placed in cold RPMI for immediate processing. Blood samples were collected before surgery for the isolation of peripheral blood mononucleated cells (PBMCs) for normal WGS. The isolated cells were frozen and stored at −80 °C. In addition, tissue was snap frozen for bulk DNA extraction and tumour WGS. Tissue was also subjected to formalin fixation and paraffin embedding (FFPE) for histological, immunohistochemical and multiplex immunophenotypic characterization.
Sample processing. We profiled patient samples using five different experimental assays: 1. CD45 + and CD45 − flow-sorted cells were collected from fresh tissue samples and processed for scRNA-seq in 156 sites from 41 patients (~6,000 cells per site; Supplementary  Fig. 1b). 5. Snap-frozen tissues were processed to obtain matched tumour-normal WGS data for a single representative site in n = 40 patients with scRNA-seq, H&E and mpIF data, to derive mutational processes from genome-wide single-nucleotide and structural variants.
scRNA-seq. Tissue dissociation. Tumour tissue was immediately processed for tissue dissociation. Fresh tissue was cut into 1-mm pieces and dissociated at 37 °C using the Human Tumor Dissociation kit (Miltenyi Biotec) on a gentleMACS Octo Dissociator. After dissociation, single-cell suspensions were filtered and washed with ammonium-chloridepotassium (ACK) lysing buffer. Cells were stained with trypan blue, and cell counts and viability were assessed using the Countess II Automated Cell Counter (ThermoFisher) (for a detailed protocol, see ref. 30  Sequencing. DNA quantity was measured using the Quant-iT PicoGreen dsDNA assay (ThermoFisher, P11496), and DNA quality was assessed with TapeStation D1000 ScreenTape (Agilent, 5067-5582). After Pico-Green quantification and quality control with an Agilent BioAnalyzer, 500 ng of genomic DNA was sheared using an LE220-plus focused ultrasonicator (Covaris, 500569) and sequencing libraries were prepared using the KAPA Hyper Prep kit (Kapa Biosystems, KK8504) with modifications. In brief, libraries were subjected to a 0.5× size selection using AMPure XP beads (Beckman Coulter, A63882) after postligation clean-up. Libraries were not amplified by PCR and were pooled in an equal volume and quantified on the basis of their initial sequencing performance. Samples were run on a NovaSeq 6000 in a 150-bp/150-bp paired-end run, using the NovaSeq 6000 SBS v1 kit and an S1, S2 or S4 flow cell (Illumina).
Preparation, review and scanning of histopathology slides. Archived FFPE tissues were used for histological review, including the assessment of spatial topology and tumour-infiltrating lymphocytes (TILs), as well as for immunohistochemical characterization and mpIF analysis for mapping of the TME, in the Advanced Immunomorphology Platforms Laboratory. Slides were originally reviewed by gynaecological pathologists for diagnosis and FIGO (International Federation of Gynecology and Obstetrics) stage assignment. Representative H&E-stained slides from each site of interest were digitally scanned to produce virtual slides. Two senior gynaecological pathologists then reviewed these images for the presence and location of serous tubal intraepithelial carcinoma (STIC), SET architecture (solid, pseudo-endometrioid and transitional cell-like patterns), micropapillary architecture 32 , presence of a fimbrial ball, architectural patterns of metastatic disease 33 , mitotic counts (per ten high-power fields, HPFs) and tumour cell content (viable percentage). Regions with TILs were also assessed with a quantitative TIL score (low, <42 TILs per HPF in a hotspot; high, 42 or more TILs per HPF in a hotspot) 32 . Histopathology slides were scanned into whole-slide images using a Leica Aperio AT2 scanner (Leica Biosystems) at ×20 magnification. The most representative tissue block was selected for slide scanning.
Article mpIF. Overview. We carried out multiparameter quantification of epithelial and immune cell subsets and activation markers using the AkoyaBio Vectra automated imaging system at the MSKCC Parker Institute for Cancer Immunotherapy. We stained whole slides of FFPE tissue for markers of ovarian cancer cells (panCK + CK8-CK18) and of specific leukocyte subsets, including macrophages (CD68) and cytotoxic T cells (CD8), known immune inhibitory proteins (PD-L1) and markers of the activation/exhaustion status of CD8 + T cells (PD-1, TOX). FOVs were chosen to include either the entire tissue with minimal field overlap if the tissue was small or a distribution of fields with 50% stroma/tumour at the edge plus some central areas of tumour-dense fields. Quality control was performed on marker intensities so that they fell in the range of 5-30 arbitrary units and helped guide spectral unmixing. Lower values might be close to background, while higher values prompted us to check for channel spillage. Tissue staining. Primary antibody staining conditions were optimized using standard immunohistochemical staining on the Leica Bond RX automated research stainer with DAB detection (Leica Bond Polymer Refine Detection, DS9800). Using 4-μm FFPE tissue sections and serial antibody titrations, the optimal antibody concentration was determined followed by transition to a seven-colour multiplex assay with equivalency. Optimal primary antibody stripping conditions between rounds in the seven-colour assay were performed following a cycle of tyramide deposition followed by heat-induced stripping (see below) and subsequent chromogenic development (Leica Bond Polymer Regine Detection, DS9800) with visual inspection for chromogenic product with a light microscope by a senior pathologist. Multiplex assay antibodies and conditions are described in Supplementary  Table 6. Tissue sections were baked for 3 h at 62 °C in vertical slide orientation with subsequent deparaffinization performed on the Leica Bond RX followed by 30 min of antigen retrieval with Leica Bond ER2 and six sequential cycles of staining with each round including a 30-min combined block and primary antibody incubation (Akoya Antibody Diluent/Block, ARD1001).
For panCK and CK8-CK18, detection was performed using a secondary horseradish peroxidase (HRP)-conjugated polymer (Akoya Opal Polymer HRP Ms + Rb, ARH1001; 10-min incubation). Detection of all other primary antibodies was performed using a goat anti-mouse Poly HRP secondary antibody or goat anti-rabbit Poly HRP secondary antibody (Invitrogen, B40961 and B40962; 10-min incubation). The HRP-conjugated secondary antibody polymer was detected by fluorescent tyramide signal amplification using Opal dyes 520, 540, 570, 620, 650 and 690 (Akoya, FP1487001KT, FP1494001KT, FP1488001KT, FP1495001KT, FP1496001KT, FP1497001KT). The covalent tyramide reaction was followed by heat-induced stripping of the primary antibody-secondary antibody complex using PerkinElmer AR9 buffer (AR900250ML) and Leica Bond ER2 (90% ER2 and 10% AR9) at 100 °C for 20 min before the next cycle (one cycle of stripping for CD68, PD-1, PD-L1, CD8 and panCK/CK8/CK18 and two cycles of stripping for TOX). After six sequential rounds of staining, sections were stained with Hoechst (Invitrogen, 33342) to visualize nuclei and mounted with ProLong Gold antifade reagent mounting medium (Invitrogen, P36930). Imaging and spectral unmixing. Seven-colour multiplex-stained slides were imaged using Vectra Multispectral Imaging System version 3 (PerkinElmer). Scanning was performed at ×20 magnification (×200 final magnification). Filter cubes used for multispectral imaging were DAPI, FITC, Cy3, Texas Red and Cy5. A spectral library containing the emitted spectral peaks of the fluorophores in this study was created using Vectra image analysis software (PerkinElmer). Using multispectral images from slides singly stained for each marker, the spectral library was used to separate each multispectral cube into individual components (spectral unmixing), allowing for identification of the seven marker channels of interest, using InForm 2.4 image analysis software.
Computational methods scRNA-seq. Overview. The pipeline was built using the 10x Genomics Martian language and computational pipeline framework. CellRanger software (version 3.1.0) was used to perform read alignment, barcode filtering and unique molecular identifier (UMI) quantification using the 10x GRCh38 transcriptome (version 3.0.0) for FASTQ inputs. Quality control. CellRanger-filtered matrices were loaded into individual Seurat objects using the Seurat R package (version 3.0.1) 34,35 . The resulting gene-by-cell matrix was normalized and scaled for each sample. Cells retained for analysis had a minimum of 500 expressed genes and 1,000 UMI counts and had less than 25% mitochondrial gene expression. Cell cycle phase was assigned using the Seurat CellCycle-Scoring function. Scrublet (version 0.2.1) was used to calculate and filter cells with a doublet score greater than 0.25. Sample matrices were merged by patient and subsequently renormalized and scaled using default Seurat functions. Major cell type identification. Major cell type assignments were computed for each patient with CellAssign (version 0.99.2) 36 using a set of curated marker genes. Marker genes were compiled for nine major cell types related to HGSOC (Supplementary Table 4). These major cell types were defined as T cells, B cells, plasma cells, myeloid cells, DCs, mast cells, endothelial cells, fibroblasts and ovarian cancer cells. Before running CellAssign, cells with zero expression for all marker genes were removed from the count matrix. Cell-specific size factors were computed using scran (version 3.11). Default CellAssign parameters were used with a design matrix of patient batch labels. CellAssign returned a probability distribution over the major cell types, and individual cells were labelled by the resulting most probable cell type. Dimensionality reduction. Principal-component analysis (PCA) was performed on the filtered feature-by-barcode matrix. UMAP embeddings including cohort-level and patient-level embeddings for all major cell types were based on the first 50 principal components. UMAP embeddings of major cell type supersets (see below) were based on the 50 batch-corrected harmony components. Diffusion map embeddings and pseudotime estimates were computed using the R package destiny (v3.0.1) for the subset of CD8 + T cells 37 . (6) ovarian cancer cells. For each superset, the R package harmony (version 0.1) was used for batch correction to account for patient-specific effects 38 . Clustering and cell subtype identification. Graph-based clustering was performed for each superset using the Louvain algorithm implemented in Seurat (version 3.0.1) at three different resolutions (0.1, 0.2 and 0.3). Differential expression between identified clusters was computed using a two-sided Wilcoxon rank-sum test as implemented in Seurat FindMarkers. Final results were filtered on log(fold change) > 0.25 and Benjamini-Hochberg-adjusted P < 0.05. Clusters were annotated on the basis of marker genes identified in differential gene expression analysis. Patient-specific clusters not represented across the full cohort were identified using relative entropy. Relative entropy per cluster was defined as the maximum entropy per cluster divided by the empirical entropy of patient compositions. Clusters with a relative entropy of <0.8 were considered patient-specific clusters and disregarded for downstream analyses.

Batch correction and integration. Major cell types identified across samples were split into six supersets: (1) T cells; (2) B cells and plasma cells; (3) myeloid cells, DCs and mast cells; (4) fibroblasts; (5) endothelial cells; and
For T cell clusters, T cells and NK cells were clustered in two steps. Initial coarse-grained clustering resulted in ten different T and NK cell clusters, including four CD4 + T cell clusters, three CD8 + T cell clusters, two NK cell clusters and one cycling T/NK cell cluster (Extended Data Fig. 5a). Subclustering identified a total of 41 distinct fine-grained clusters, broadly defining major T cell and NK cell subtypes (Fig. 2a and  Extended Data Fig. 5b). These included populations of CD4 + naive and central memory cells (expressing IL7R and TCF7), CD4 + effector memory cells (IL7R, CCL5 and KLRB1), early and late dysfunctional CD4 + T cells (expressing dysfunctional T cell markers CXCL13, TOX2 and PDCD1), regulatory T cells (FOXP3 and IL2RA) and type 17 helper T cells (KLRB1, RORA and RORC). In the CD8 + compartment, we also identified populations of naive/central memory (expressing KLF2, KLF3 and TCF7), activated/ cytotoxic (GZMH, GZMK and HLA-DR) and early and late dysfunctional (CXCL13, TOX2, LAG3, HAVCR2, TIGIT and PDCD1) T cells. Notably, the early dysfunctional cluster, in addition to exhaustion-associated genes, was characterized by expression of CXCR6 and ITGAE, commonly used to define tissue-resident memory T cells. In the innate compartment, we similarly identified several clusters, including a γδ T cell cluster and several NK cell clusters. Finally, in all compartments, we identified populations of cells marked by expression of type I IFN response genes such as ISG15 and IFIT3, herein named CD4-ISG, CD8-ISG and NK-ISG, with strong upregulation of the JAK-STAT signalling pathway as the dominant feature of these cells (Fig. 2b). The remaining clusters consisted of cycling T and NK cells expressing S phase markers such as MKI67 and G2M markers such as TOP2A (Supplementary Table 4).
For myeloid cell clusters, cDCs of the myeloid lineage were separated into cDC1s, cDC2s and mDCs, marked by expression of CLEC9A, S100B and BIRC3, respectively (Extended Data Figs. 5d and 6a). pDCs were marked by expression of PTGDS. Macrophage clusters were described with respect to their classical (M1-like) or alternative (M2-like) polarization. Six different clusters encompassing both classical and alternatively activated macrophages were identified, as well as a cluster of cycling macrophages (Cycling.M) and a cluster of actively phagocytic macrophages (Clearing.M). The M1-like and M2-like clusters were labelled according to the top genes defining the clusters (M1.S100A8, M2.CXCL10, M2.SELENOP, M2.MARCO, M2.COL1A1, M2.MMP9) (Extended Data Figs. 5d and 6b). Among these, the M1.S100A8 cluster was the only unambiguous M1-type macrophage cluster, marked by expression of pro-inflammatory calcium-binding protein genes S100A8 and S100A9 22 . The M2.CXCL10 cluster was characterized by expression of both M1 (for example, CXCL10) and M2 (for example, PDL1 and C1QC) markers. CXCL10 is an established downstream target of type I and type II IFN signalling and was found to be expressed along with other CXC-motif chemokines (CXCL9 and CXCL11). The remaining M2 clusters were all marked by high expression of complement component C1QC, which is known to promote M2 polarization 23 . InferCNV copy number clonal decomposition. InferCNV (version 1.3.5) 39,40 was used to identify large-scale copy number alterations in ovarian cancer cells classified by CellAssign. To do this, 3,200 non-cancer cells were randomly sampled from the cohort and used as the set of reference 'normal' cells. After subtracting out reference expression in non-cancer cells, chromosome-level smoothing and denoising with InferCNV, we derived a processed expression matrix that represented copy number signals. Cancer cell subclusters were identified by ward.D2 hierarchical clustering and the 'random_trees' partition method using P < 0.05. Gene signature scores. Cell state scores were calculated for the exhausted phenotype within the set of T cells using a manually curated list of genes as input to the Seurat AddModuleScore method 40 . The curated list of genes was derived from a review of single-cell analyses of CD8 + T cell states in human cancers 41 (Supplementary Table 4). Patient specificity. Patient specificity scores were computed by using a shared nearest-neighbour graph. For a given cell, patient specificity was defined as the observed fraction of nearest neighbours divided by the expected fraction of nearest neighbours in the patient subgraph.
Here the expected fraction of neighbours from the same patient was defined as the global fraction of cells for each patient. Scores were log 2 transformed. Hence, a positive patient specificity score indicates an over-representation of cells derived from the same patient among its nearest neighbours, a negative score indicates an under-representation of cells from the same patient and a score of 0 reflects a perfectly mixed neighbourhood of patient labels.
Intra-and inter-patient variation in cluster composition. To calculate intra-sample diversity of cluster composition, we used the Shannon entropy H: where p c is the proportional abundance of cluster c and C is the total number of clusters.
To estimate the similarity or dissimilarity between samples, we used the Bray-Curtis dissimilarity index D for samples i and j, defined as . We only considered the triangular distance matrix D such that i < j. The pairwise distance matrix was estimated by randomly subsampling the dataset with a minimum number of cells per sample and averaging over the subsampled datasets after 100 iterations. We then evaluated intra-and inter-patient dissimilarity on the basis of the distributions of the off-diagonal elements in the averaged distance matrix (for example, all pairs of adnexal samples or all pairs of HRD-Dup samples).
These definitions were used to estimate the intra-sample diversity, intra-patient dissimilarity and inter-patient dissimilarity of cluster composition of cell states within each major cell type superset (cancer cells, Fig. 3g; T and NK cells, Figs. 2d and 4f; myeloid cells, Extended Data Figs. 6d and 11b). Rarefaction of samples was applied in estimation of the Bray-Curtis dissimilarity matrix on the basis of the number of cells for each subset (n = 400 cells per sample).
Finally, we also used non-metric multidimensional scaling (NMDS) to visualize the pairwise distances of cell type abundances in low-dimensional space. We used the pairwise dissimilarity matrix D to calculate the rank order of the Bray-Curtis distance and project differences in cluster composition in two dimensions using NMDS (cancer cells, Extended Data Fig. 8i; T and NK cells, Extended Data Figs. 7c and 10c; myeloid cells, Extended Data Figs. 7h and 11f). GLMs of cluster composition. To estimate the effect of mutational signatures and tumour site specificity on the composition of cell clusters, we considered a GLM where we included interactions between signature, site and cluster identity for each major cell type defined in the scRNA-seq, H&E and mpIF data. The data matrix included the counts of every cluster c, sampled from site s in a patient with mutational signature subtype m. Using a binomial linear model, one can analyse counts of repeated observations of cell types or cell states as binary choices: where N c is the cell count for cluster c in a sample, N is the total number of cells in the sample and the probability to detect the cluster can be described by the logit function βX log = .
To account for the effect of mutational signature and anatomical tumour site on the cluster abundance observed in scRNA-seq data, we formulated a GLM of the observed cell counts N c for a cell type or cell state described by the logit function, which is distributed as where β 0 is a shared constant baseline per cluster that must be inferred; β c , β m and β s are individual fixed-effect terms to be inferred; β cm and β cs are cluster-signature and cluster-site interaction effects to be Article inferred; x c , x m and x s are elements of the model design matrix X; and σ ε represents measurement noise. We note that for each cluster c we had multiple measurement replicates of N c across signatures and sites. This formulation was used to fit a GLM of major cell types (Fig. 1f). We also used this formulation to separately fit GLMs of cluster composition for each superset of coarse-grained immune cell types (T and NK cells, Extended Data Fig. 7b; myeloid cells, Extended Data Figs. 7g and 11e) and GLMs of cluster composition for fine-grained immune cell states (T and NK cells, Fig. 2c; DCs, Extended Data Fig. 11a; macrophages, Extended Data Fig. 11a).
To model the abundance of major cell types in the scRNA-seq data from CD45 + and CD45 − samples, the GLM included a covariate for CD45 +/− flow sorting with additional fixed-effect sorting coefficients β f and additional cluster sorting interactions β cf to be inferred, plus an additional element x f in the model design matrix (Fig. 1f). Similarly, GLMs for H&E and mpIF data accounted for differences in cell type abundance observed in the tumour and stroma regions, incorporating a covariate for the tumour or stroma region counts with additional fixed-effect region coefficients β r and additional cluster-region coefficients β cr to be inferred, plus an additional element x r in the model design matrix (Fig. 1f).
To quantify interactions between mutational signature and anatomical tumour site, we also fitted GLMs with an additional interaction term: where β csm terms were cluster-specific signature-site interaction effects to be inferred. This formulation was used to fit GLMs of cluster composition of cell states within each major cell type superset, both for fine-grained clusters (cancer cells, Fig. 3d; T and NK cells, Fig. 4b; DCs, Extended Data Fig. 11a; macrophages, Extended Data Fig. 11a) and coarse-grained clusters (T and NK cells, Extended Data Fig. 10e; myeloid cells, Extended Data Fig. 11e).

PD-1 and PD-L1/PD-L2 co-expression analysis.
To determine potentially interacting cell type subclusters for the receptor-ligand pair PD-1-PD-L1/PD-L2, we first computed the fraction of sender cells (cancer cell or myeloid cell clusters) expressing the PD-L1 and PD-L2 ligands (CD274 or PDCD1LG2 read counts >0 in >10% of cells) and the fraction of receiver cells (T cell clusters) expressing the PD-1 receptor (PDCD1 read counts >0 in >10% of cells) for every patient. Co-expression networks were constructed as follows: for a given group of patients of the same mutational subtype (Extended Data Fig. 13f), an edge was drawn between sender cell clusters and receiver cell clusters if the ligands (CD274 or PDCD1LG2) and receptor (PDCD1) were co-expressed in the sender and receiver subclusters for at least 50% of patients in that group.  42 . Both SNVs and indels were then annotated for variant effects and gene-coding status using SnpEff4 (version 5.0e). We identified a set of high-confidence SNVs by taking the intersection of the high-probability calls predicted from mutationSeq (with probability ≥0.9) and the somatic SNVs predicted by Strelka. The high-confidence set of SNVs was further filtered by removing positions that fell within either of the following regions: (1) the UCSC Genome Browser blacklists (Duke and DAC) and (2) regions defined in the 'CRG Alignability 36mer track' with more than two nucleotide mismatches, requiring a 36-nucleotide fragment to be unique in the genome even after allowing for two differing nucleotides. Postprocessing on this set of high-confidence SNVs and somatic indels from Strelka involved removing known variants (both SNVs and indels), which were obtained from the 1000 Genomes Project (release 20130502) and dbSNP (version dbsnp 142.human 9606). The set of high-confidence somatic SNVs and indels passing the above filters were then used in feature computation for mutational signature analysis and in neoantigen prediction. Rearrangements. Rearrangement breakpoints were predicted using lumpy (version 0.2.12) 43 executed by SpeedSeq (version 0.1.08) 44 and destruct (version 0.4.18) derived from nFuse 45 , available at https:// github.com/amcpherson/destruct. In brief, destruct extracted discordant and non-mapping reads from BAM files and realigned the reads using a seed-and-extend strategy. Split alignment across a putative breakpoint was attempted for reads that did not fully align to a single locus. Discordant alignments were clustered according to the likelihood that they were produced from the same breakpoint. Multiply mapped reads were assigned to a single mapping location using previously described methods 46 . Finally, heuristic filters removed predicted breakpoints with poor discordant read coverage of sequence flanking predicted breakpoints. We applied stringent three-step filtering criteria to identify high-confidence breakpoint calls for downstream analysis as follows: Step 1: Breakpoints that were predicted by both algorithms, lumpy and destruct, were taken forward.
Step 2: We removed (1) breakpoints from regions with poor mappability, (2) events with a break distance of ≤30 bp and (3) breakpoints annotated as a deletion with a breakpoint size of <1,000 bp. Furthermore, only high-confidence breakpoints that had at least five supporting reads in the tumour sample and no read support in the matched normal sample were used in the analysis. The breakpoints were further filtered by removing positions that fell in either of the following regions: (1) UCSC Genome Browser blacklists (Duke and DAC) and (2) regions defined in the 'CRG Alignability 36mer track' with more than two nucleotide mismatches, requiring a 36-nucleotide fragment to be unique in the genome even after allowing for two differing nucleotides.
Step 3: Predictions with a small break distance and a low number of supporting reads in tumour samples were excluded. Copy number. Genome-wide allele-specific copy number was called in matched tumour-normal WGS samples using ReMixT 47 and TitanCNA 48 with default parameters. A parameter grid search for multiple purity and ploidy solutions was carried out, and the top solution was selected after manual assessment of the copy number segmentations. All tumour samples were run with ploidy = 2 and ploidy = 4 initializations.

Myriad HRD test.
We used a commercial assay (Myriad Genetics 'my-Choice CDx') to test for genome-wide LOH, the number of chromosomal breakpoints in large-scale state transitions and telomeric allelic imbalance. If the resulting HRD score was greater than 42, the sample was deemed to be HRD.
Targeted sequencing (MSK-IMPACT). Genomic DNA isolated from FFPE tumour tissue and matched normal blood was subjected to hybridization capture and sequenced with deep coverage (700×) 49 . Variant calling for the MSK-IMPACT gene panel and copy number analysis were performed using the MSK-IMPACT clinical pipeline (https://github. com/mskcc/Innovation-IMPACT-Pipeline).
Mutational signatures. We analysed mutational signatures by integrating SNVs and structural variations detected by bulk WGS in a unified probabilistic approach called multimodal correlated topic models (MMCTM) 6 . MMCTM analysis enables robust determination of mutational signatures and their correlation structure and delineation of subgroupings based on point mutation signatures 50 and structural variations.
We estimated signature probabilities for bulk WGS samples in the MSK SPECTRUM cohort (n = 40) using MMCTM, on the basis of SNV and structural variation signatures inferred from HGSOC (n = 170) and triple-negative breast cancer (n = 139) bulk whole genomes (total n = 309) (Extended Data Fig. 2b). By clustering the meta-cohort of 309 HGSOC and triple-negative breast cancer samples using UMAP and HDBSCAN 51 , we used the meta-cohort as a training dataset to fit a k-nearest-neighbour (kNN) classifier and applied the kNN classifier to the SPECTRUM samples (n = 40), assigning them into one of four strata defined solely by SNV and structural variation signature probabilities. A nearest-neighbour graph was built using a Euclidean distance metric, and classification into strata was computed by a majority vote of the k nearest neighbours of the unknown test sample (k = 30), requiring m votes for an assignment (m = 25). The four strata included those with samples enriched for (1) BRCA1-associated HRD point mutation signatures accompanied by tandem duplications (HRD-Dup), (2) BRCA2-associated HRD point mutation signatures accompanied by interstitial deletions (HRD-Del), (3) foldback inversions mediated by breakage-fusion bridge cycles (FBI) and (4) a group of ambiguous samples near the classifier decision boundaries ('Undetermined') (Extended Data Fig. 2c).
To validate the MMCTM mutational signatures, we used two independent computational methods (Extended Data Fig. 2b). We applied HRDetect 18 to validate HRD status on the basis of SNV signatures previously associated with HRD (SBS3, SBS8), short microhomology-mediated indels (ID8) and rearrangement signatures (RS3, RS5). Samples with an HRDetect score of >0.1 were defined as HRD. We also applied CHORD 19 to validate HRD status and stratification of HRD-Dup from HRD-Del cases. CHORD incorporates SNVs, indels and structural variations and relies on duplications (1-100 kb) to distinguish BRCA1-like from BRCA2-like HRD.
WGS-derived HRD signatures were in agreement for seven of seven cases with BRCA1 or BRCA2 loss (Extended Data Fig. 2b). WGS-based and standard-of-care HRD status were concordant in five of six cases. The discordant case (024) was deemed HRD by all three independent methods for WGS signature inference (MMCTM, HRDetect and CHORD).
Focal amplifications and deletions. We used WGS copy number inferred by ReMixT 47 to classify copy number changes as focal amplifications and deletions in the MSK SPECTRUM cohort. For focal amplifications, we calculated the percentile of each gene with respect to the cumulative distribution of total copy number changes across the genome. On the basis of the mean copy number across each gene, we classified high-level amplifications as those in the top 2% of bins with a log 2 -transformed change over ploidy greater than 1. For homozygous deletions, we considered gene copy number in overlapping segments and we classified segments that were 10 kb or greater in size with a mean copy number less than 0.5 as homozygous deletions.
Similarly, we used IMPACT copy number inferred by FACETS 52 to delineate focal amplifications and homozygous deletions in the MSK IMPACT HGSOC cohort. Focal amplifications and deletions were identified on the basis of the median copy number log ratio per segment, only considering segments shorter than 10 Mb with ten or fewer genes to suppress arm-level events. Segments with a total copy number greater than 8 were considered as high-level amplifications. Homozygous deletions were called for segments with a total copy number of 0.

HLA LOH.
To detect allele-specific copy number LOH of the HLA locus in single cells profiled by scRNA-seq, we inferred allele-specific alterations on chromosome arm 6p, which harbours HLA class I and II genes, using SIGNALS 5 . We first called germline heterozygous single-nucleotide polymorphisms (SNPs) in the scRNA-seq tumour data using cellSNP 53 . As input, we used the set of heterozygous SNPs identified in the corresponding normal WGS dataset for each sample. The liftover script provided in cellSNP was used to lift over SNP coordinates from the GRCh37 (hg19) reference genome to the GRCh38 reference genome. Following genotyping, we aggregated SNP counts across all cells and defined the B allele as the allele with the lowest allele frequency for each SNP. As SNP counts are very sparse in scRNA-seq data, we then aggregated cell-level counts of the B allele across chromosome arms to compute the BAF for each arm in each cell. We then generated a cell-by-chromosome arm BAF matrix and incorporated this into the Seurat gene expression objects. To assign allelic imbalance states (balanced, imbalanced, LOH) to chromosome arms in each cell, we used the mean BAF of each arm per cell as follows: balanced, BAF ≥ 0.35; imbalanced, 0.15 ≤ BAF < 0.35; LOH, BAF < 0.15. Documentation and code are available at https://shahcompbio.github.io/signals/.
To validate our observations of allele-specific alterations on chromosome arm 6p in relation to the HLA locus, we detected gene-level HLA class I LOH from tumour and matched normal WGS data, as well as from tumour and matched normal MSK-IMPACT data, using LOHHLA 24 .
To validate HLA LOH status by WGS, we used 40 tumour-normal pairs from 40 patients. Tumour purity and ploidy were estimated using ReMiXT 47 and used for subsequent HLA LOH analysis. To validate HLA LOH status by MSK-IMPACT, we selected 1,298 tumour-normal pairs from 1,298 patients in the MSK-IMPACT cohort with HGSOC histology based on an HGSOC or HGSFT OncoTree classification 54 . This cohort did not include MSK-IMPACT samples from patients who were part of the MSK SPECTRUM cohort.
Patient HLA references were built from tumour and normal reads using Polysolver (v4) 55 , for both WGS and MSK-IMPACT data. Tumour purity and ploidy from the WGS datasets were estimated using ReMixT 47 and used for subsequent HLA LOH analysis. Similarly, tumour purity and ploidy for the MSK-IMPACT datasets were estimated using FACETS 52 . HLA LOH was called for an allele in the tumour sample using LOHHLA. LOH was observed for each HLA gene if the estimated copy number was <0.2 and the statistical significance of the allelic imbalance was P < 0.01, testing for pairwise differences in log(R) values between the two HLA homologues (paired t test).

Digital histopathology.
We built a training dataset of cellular annotations for scanned H&E images. Expert delineation and quantification of cell and tissue types present in the H&E slides was carried out on MSK Slide Viewer, a computational pathology interface for review and annotation of histopathology images. Nuclear segmentation was carried out using StarDist, a method for nuclear detection based on the U-Net neural network architecture 56,57 . Membrane segmentation was approximated using a cell expansion of 3 μm of the nuclear boundary. The training dataset encompasses a set of 61 slides from a representative set of patients and sites. To classify regions of tumour, stroma, vasculature and necrosis, we trained an artificial neural network (ANN)-based pixel classifier using QuPath (v0.2.3) 56 , which operates on higher-order pixel features over multiple channels and scales within an image. In addition, lymphocytes and 'other' cells were annotated in 19 of these slides by a researcher using MSK Slide Viewer. After importing these annotations into QuPath, along with cellular segmentations and feature vectors generated from StarDist, we trained an ANN-based cellular classifier that operates over cellular measurements to identify lymphocytes. We then applied these models for inference across 100 whole-slide H&E images from 35 patients. Segmentation yielded a total of 24,628,462 cells across samples, and we used the model outputs to compute statistics on lymphocyte densities and other spatially derived measurements.
mpIF. We carried out nuclear segmentation based on DAPI intensity using the watershed algorithm in QuPath (v0.2.3) 56 , setting a minimum DAPI threshold of 1 arbitrary unit with an expected nucleus area ranging between 5 μm 2 and 100 μm 2 . Membrane segmentation was approximated using a cell expansion of 3 μm of the nuclear boundary. Starting from 1,349 quality-filtered FOVs across 100 tissue samples from 35 patients, segmentation yielded a total of 10,892,612 cells. To Article annotate regions of tumour and stroma, we trained a pixel classifier with examples of panCK + (tumour) and panCK − (stroma) regions. Following nuclear segmentation, we extracted the pixel intensities per cell for functional markers expressed in the cytoplasm (panCK, CD68, CD8, PD-1, PD-L1) and in the nucleus (TOX) to define cell types and cell states. All channels were manually thresholded in at least one FOV per slide, and marker positivity was determined by setting these thresholds on the mean pixel intensity. Segmented objects that were double or triple positive for multiple cell type markers (panCK, CD68, CD8) were counted as separate cells, yielding a total of 12,359,463 single cells. Marker assignments were used to define cell states of epithelial cells (panCK + PD-L1 − , panCK + PD-L1 + ), macrophages (CD68 + PD-L1 − , CD68 + PD-L1 + ) and CD8 + T cells (CD8 + PD-1 − TOX − , CD8 + PD-1 + TOX − , CD8 + PD-1 + TOX + ).
Analysis of spatial topology comprised estimation of spatial densities and intercellular nearest-neighbour distances. Spatial density estimates as a function of distance to the tumour-stroma boundary were obtained by aggregating cell counts within 10 μm distance bands from the boundary in each FOV, grouped across FOVs and normalized by the total number of cells for a given phenotype of interest. Error bars were calculated as the standard error of the probability p of observing a given phenotype as , where N was the total number of cells in the distance band. Intercellular distances between nearest neighbours were calculated using the distance matrix r ij for cells i and j, where the value of the (i, j) element in the matrix was the radial distance from cell i to cell j. After computing per-cell nearest neighbours, the summary statistics over nearest-neighbour distances were estimated for each phenotype. Proximity counts for phenotypes within a fixed radius R were also determined on the basis of per-cell nearest neighbours.

Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

March 2021
Corresponding author(s): Sohrab P. Shah, Dmitriy Zamarin Last updated by author(s): Dec 1, 2022 Reporting Summary Nature Portfolio wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency in reporting. For further information on Nature Portfolio policies, see our Editorial Policies and the Editorial Policy Checklist.

Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.

n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Data Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A description of any restrictions on data availability -For clinical datasets or third party data, please ensure that the statement adheres to our policy The MSK SPECTRUM study is registered on dbGaP under accession number phs002857.v1.p1 (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi? study_id=phs002857.v1.p1).
A Synapse page for the MSK SPECTRUM study is available to provide access to multi-modal datasets from one central location. This page can be accessed under accession number syn25569736 (https://www.synapse.org/msk_spectrum).

Validation
Primary antibody staining conditions were optimized using standard immunohistochemical staining on the Leica Bond RX automated research stainer with DAB detection (Leica Bond Polymer Refine Detection DS9800). Using 4 μm formalin-fixed, paraffin-embedded tissue sections and serial antibody titrations, the optimal antibody concentration was determined followed by transition to a sevencolor multiplex assay with equivalency. Optimal primary antibody stripping conditions between rounds in the seven-color assay were performed following 1 cycle of tyramide deposition followed by heat-induced stripping (see Methods) and subsequent chromogenic development (Leica Bond Polymer Regine Detection DS9800) with visual inspection for chromogenic product with a light microscope by a senior pathologist.

Human research participants
Policy information about studies involving human research participants

Population characteristics
The study cohort (MSK SPECTRUM) includes 42 women with newly diagnosed, treatment-naive high-grade serous ovarian cancer (HGSOC). Patients between the ages of 39 and 81 at diagnosis (median age: 61 years). 6 out of 42 cases had BRCA1 mutations (14%) and 1 out of 42 cases had a BRCA2 mutation (2%).

Recruitment
All enrolled patients were consented to an institutional biospecimen banking protocol and a protocol to perform targetted panel sequencing (MSK-IMPACT). All analyses were performed per a biospecimen research protocol. All protocols were approved by the Institutional Review Board (IRB) of Memorial Sloan Kettering Cancer Center (MSKCC). Patients were consented following the IRB-approved standard operating procedures for informed consent. Written informed consent was obtained from all patients before conducting any study-related procedures. This study was conducted in accordance with the Declaration of Helsinki and the Good Clinical Practice guidelines (GCP).

Ethics oversight
Institutional Review Board (IRB) at Memorial Sloan Kettering Cancer Center (MSKCC).
Note that full information on the approval of the study protocol must also be provided in the manuscript.

Clinical data Policy information about clinical studies
All manuscripts should comply with the ICMJE guidelines for publication of clinical research and a completed CONSORT checklist must be included with all submissions.
Clinical trial registration Not applicable.

Study protocol
The clinical study was conducted under the purview of MSKCC institutional tissue banking protocol 06-107 titled "Storage and research use of human biospecimens". Data generation and data analysis were carried out under MSKCC protocol 15-200 titled "Chemotherapy, somatic mutations, neoantigens, and the immune environment in ovarian cancer". Protocol 15-200 operates by using specimens banked under protocol 06-107. Due to MSKCC standard operating policies, internal protocol documents cannot be shared publicly but are available upon request to the IRB.

Data collection
Patients were enrolled on the study at MSKCC in New York, USA. Sample collection took place between January 2019 and March 2021.

Outcomes
No patient outcomes were reported.