Pilot study of bempegaldesleukin in combination with nivolumab in patients with metastatic sarcoma

PD-1 blockade (nivolumab) efficacy remains modest for metastatic sarcoma. In this paper, we present an open-label, non-randomized, non-comparative pilot study of bempegaldesleukin, a CD122-preferential interleukin-2 pathway agonist, with nivolumab in refractory sarcoma at Memorial Sloan Kettering/MD Anderson Cancer Centers (NCT03282344). We report on the primary outcome of objective response rate (ORR) and secondary endpoints of toxicity, clinical benefit, progression-free survival, overall survival, and durations of response/treatment. In 84 patients in 9 histotype cohorts, all patients experienced ≥1 adverse event and treatment-related adverse event; 1 death was possibly treatment-related. ORR was highest in angiosarcoma (3/8) and undifferentiated pleomorphic sarcoma (2/10), meeting predefined endpoints. Results of our exploratory investigation of predictive biomarkers show: CD8 + T cell infiltrates and PD-1 expression correlate with improved ORR; upregulation of immune-related pathways correlate with improved efficacy; Hedgehog pathway expression correlate with resistance. Exploration of this combination in selected sarcomas, and of Hedgehog signaling as a predictive biomarker, warrants further study in larger cohorts.

S arcomas are diverse malignancies comprising more than 100 subtypes of bone and soft tissue origin 1 . Each year, up to 16,000 patients are diagnosed in the United States 2 . In the metastatic setting, there are no good treatment options: frontline chemotherapy is associated with low objective response rates (ORR, averaging~18%) 3,4 and second-line treatments have lower efficacy still 5 . In other cancers, the use of monoclonal antibodies that block immune checkpoints such as programmed death 1 (PD-1; PDCD1) are effective alternative therapies to mobilize the immune system 6 , but in sarcoma their activity has been modest [7][8][9][10] . Studies that could elucidate this low response by characterizing the tumor microenvironment in sarcoma are limited.
High tumor mutational burden (TMB), tumor PD-L1 (CD274) expression, and increased tumor-infiltrating lymphocytes (TIL) are biomarkers of response to checkpoint inhibitors in multiple cancer types that have been found to be low across most sarcoma subtypes 11,12 . Prior studies in sarcoma have associated the presence of TlLs with improved clinical outcome 11,[13][14][15][16][17] . These markers correspond to an immune-sensitive phenotype labeled "immune-hot." In soft tissue sarcomas (STS) classified by gene expression analysis as immune-hot, with high expression of CD8 + T cells, tertiary lymphoid structures, and B cell markers, patients had improved response to PD-1 blockade 12 .
Novel approaches are needed to improve the effectiveness of PD-1 blockade while identifying predictive biomarkers of treatment response and resistance. We recently showed that combining talimogene laherparepvec via intratumoral injection with pembrolizumab in metastatic sarcoma led to immune conversion and correlated with improved clinical efficacy 10 . Bempegaldesleukin, a CD122-preferential interleukin-2 (IL-2) pathway agonist, is associated with increased proliferation and activation of TILs among patients with solid tumors, providing the rationale to explore this agent in sarcoma 18,19 .
We hypothesized that bempegaldesleukin would improve the efficacy of nivolumab PD-1 blockade. In this open-label, noncomparative pilot clinical trial (NCT03282344), we perform an integrated genetic, transcriptomic, and immunopathologic analysis to uncover the immune landscape in sarcoma at baseline and after treatment to differentiate which patients are most likely to respond. Here, we show that CD8 + T cell infiltrates and PD-1 expression correlate with improved ORR; that upregulation of immune-related pathways correlates with improved efficacy; and that Hedgehog pathway expression correlates with resistance.

Results
Patient cohort. A total of 84 patients with selected locally advanced or metastatic high-grade sarcoma were enrolled from October 6, 2017 to January 28, 2020 at Memorial Sloan Kettering Cancer Center (MSK) and MD Anderson Cancer Center (MDA). All patients received bempegaldesleukin 0.006 mg/kg and nivolumab 360 mg/kg as an intravenous (IV) infusion every 3 weeks, and treatment was continued until progressive disease (PD) or toxicity.
At the time of the data lock, 77 patients had discontinued study treatment and 7 patients remained on this combination; the most common reason for discontinuation was disease progression (Fig. 1A) while adverse events led to discontinuation in 3 (4%) patients. Thirty-four (40%) patients were alive with a median of 5 months of follow-up at the data lock date (February 24, 2020). Among the 50 deaths, 46 resulted from sarcoma and 4 were from unknown causes as patients were lost to follow-up.
Efficacy analysis: response rates and survival. Seventy-seven patients were included in the efficacy analysis as 6 had discontinued study treatment prior to disease reevaluation and 1 had not reached their first response assessment at the time of data lock. Outcomes varied by histological subtype ( Table 2, Fig. 1B). Median time to response by Response Evaluation Criteria in Solid Tumors (RECIST) was 3.7 months and median duration of response was 9.3 months. Median duration of treatment was 4.8 months (95% CI 3.2-6.1 months). Clinical benefit rate at 6 months ranged from 0% in osteosarcoma to 63% in angiosarcoma, while 67 (87%) evaluable patients experienced progressive disease (PD). Median progression-free survival (PFS) ranged from 1.8 months in LMS to 7.3 months in angiosarcoma (Fig. 1C, Table 2, Supplementary Fig. 1A). Median overall survival (OS) ranged from 5.9 months in the SBRCT tumor to not reached in both cohorts of ASPS and angiosarcoma cohorts ( Supplementary Fig. 1B).
Immune cell populations evolve after treatment and define better responders. Tumors from 61 patients were stained by immunohistochemistry (IHC) for various immune markers ( Supplementary Fig. 1C, Supplementary Data 1) using matched specimens with both baseline and on-treatment samples. PD-1 expression at baseline was associated with ORR (linear model p = 0.031, covariate p = 4.7 × 10 −4 ) while controlling for the effect of sarcoma histology ( Fig. 2A). There was no association for baseline CD8, PD-L1, CD68, or FOXP3 markers (Fig. 2B, Supplementary Fig. 2A-C, Supplementary Data 2). However, at the on-treatment time point (cycle 2), both CD8 and PD-1 were associated with ORR, suggesting a potential change in tumor immune microenvironment in patients who attained a partial response (PR; linear model p = 0.01 and 0.002, covariate p = 4.6 × 10 −4 and 7.3 × 10 −5 , respectively; Fig. 2A, B).
Immune cell populations were quantified via Microenvironment Cell Populations (MCP)counter 20 using RNA expression from 48 patients (41 baseline and 38 on-treatment biopsies) ( Supplementary Fig. 1C). The reliability of this approach was confirmed by comparing this quantification to IHC of CD8 + T cells and CD68 + macrophages and monocytes; marker expression by IHC significantly correlated with MCPcountercalculated immune cell abundance (Spearman's p < 10 −7 , rho = 0.74 and 0.75, respectively; Supplementary Fig. 2H, I).
Independent clustering across immune cell populations quantified by MCPcounter on baseline samples classified tumors into three distinct groups with variation in response rates and PFS. Immune-hot tumors (Group A) were characterized by elevated CD8 + T cells; intermediate tumors (Group B) by high neutrophils and intermediate levels of other immune cells; and immune-cold tumors (Group C) by low levels of all immune cells. Patients with immune-hot tumors at baseline were significantly enriched for partial responders (Fisher's test p value = 0.009) and had significantly longer PFS (logrank p = 0.025; Fig. 2D, E). Interestingly, 4/6 patients with partial response had immune-hot tumors at baseline. All on-treatment samples could be classified as immune-hot or immune-cold (Fig. 2F). Patients with immunehot tumors in on-treatment samples also had higher response rates (PR or stable disease [SD]: 7/9; Fisher's test p = 0.02) and improved PFS (Fig. 2G, logrank p = 0.017). The cell populations in baseline and on-treatment samples varied. For instance, T cells were consistently high at baseline and on-treatment in immunehot sample while macrophages were more prevalent in immunehot samples on-treatment versus at baseline.
Expression of immune pathways and hedgehog signaling pathway differentiate patient response. RNA-seq analysis identified 225 genes differentially expressed between patients with partial response (responders) versus non-responders across time points, of which 67 genes were upregulated (Fig. 3A, Supplementary Data 3). PD-1 was nominally differentially expressed with higher expression in patients with partial response (nominal p = 0.002, q value = 0.08). Interestingly, tumors of patients with SD had low PD-1 expression at baseline, similar to patients with PD, but showed increased expression on treatment, suggesting tumor conversion due to therapy (p = 0.014; Supplementary  Fig. 3A).
Gene set enrichment analysis (GSEA) was performed to assess shifts in expression across genes in a pathway 21 . Several immunerelated pathways were upregulated in patients with PR, including IL6/JAK/STAT3 signaling and interferon alpha response (Benjamini-Hochberg [BH] adjusted p = 0.0096 for both); epithelial mesenchymal transition and Hedgehog signaling pathways were downregulated (BH adjusted p = 0.0021 for both) (Fig. 3B). Using hierarchical clustering of single-sample GSEA scores (ssGSEA), we found some clustering of partial responders at baseline but more defined clustering of patients following treatment, with one group in which 4 of 5 patients achieved PR and another with 7 of 9 patients receiving clinical benefit (Fig. 3C). Similar pathways were identified using an alternative differential expression model with additional covariates (Supplementary Fig. 3B). Furthermore, when we specifically compared patients with progressive disease to those with clinical benefit (partial response or stable disease), we identified similar pathways including interferon alpha  Fig. 3C). Cross-validation GLM-NET of pathways and immune cell populations showed that Hedgehog signaling and CD8 + T cells, at baseline and on treatment, best predicted PFS; lower levels of Hedgehog signaling and higher levels of CD8 + T cells led to longer PFS (Fig. 3D,E). Neither measure was strongly associated with cohort ( Supplementary Fig. 3D-G). Not all Hedgehog genes move uniformly across samples but we can see a general shift to higher expression among patients with progressive disease. This trend is clearer on-treatment ( Supplementary Fig. 3H). Furthermore, while glioma-associated oncogene homolog 1 (GLI1), a transcription factor involved in Hedgehog signaling, was nominally differentially expressed in our overall model using all samples ( Fig. 3A; nominal p = 0.004, q = 0.11), its higher expression in PD and lower expression in PR was more striking in on-treatment samples (two-sided t-test p = 0.011 [PR vs Others], 0.039 [PD vs Others]; Supplementary Fig. 3I).
Genomic features of sarcomas. To determine if specific genomic alterations influenced either the immune content or expression of various pathways, whole exome sequencing was performed on samples derived from 67 patients, including 61 at baseline, 57 ontreatment, and 2 additional on-treatment biopsy specimens ( Supplementary Fig. 1C). Consistent with previous sequencing studies in sarcoma 22 , the most common putative drivers were TP53 mutations (40%), amplification of CDK4 and MDM2 loci (15%), RB1 mutations (13%), amplifications in NCOR1 or MAP2K4 (12%) and homozygous deletions of CDKN2A (13%), among others ( Supplementary Fig. 4A). Recurrent alterations identified were often unique to specific sarcoma subtypes; CDK4/MDM2 amplifications were identified in all but one dedifferentiated liposarcoma, ASPSCR1-TFE3 fusions in all patients with alveolar soft part sarcoma, and fusions involving EWSR1 in all patients with SBRCT. No significant differences in ORR or PFS were correlated with genomic alterations.
Sequencing data were available at both baseline and ontreatment (cycle 2) for 51 patients. A median of 78% (IQR 68-87%) of mutations were shared across paired samples, while the majority of discordance could be explained by differences in tumor purity ( Supplementary Fig. 4B). Of 35 patients with ≥1 driver mutation, 20 shared all driver mutations across all their samples ( Supplementary Fig. 4C). Differences in the clonality of drivers between samples was evident in 13 of the 20 patients, likely representing tumor heterogeneity.
The 2 additional on-treatment samples were from: a palliative surgical resection of a treatment-responding fibrosarcoma (other cohort) in the setting of a joint infection, which shared 62% of mutations with the baseline sample, including 2 putative drivers in GNAS and NF2; and a progressing chondrosarcoma that shared 36% of mutations from baseline, including a putative oncogenic IDH2 mutation and a distinct oncogenic TP53 alteration (missense mutation at p.G334V rather than a splicing mutation at p.T125), suggestive of convergent evolution (Supplementary Fig. 4D).
Neoantigen load is correlated with response to checkpoint inhibitors in malignancies such as melanoma and non small cell lung cancer (NSCLC) 24,25,27 . In this study, there was no significant correlation between the number of expressed neoantigens at baseline (predicted using NetMHC and with expression confirmed by RNA-seq) compared to ORR (two-sided t-test, nominal p = 0.298; Supplementary Fig. 6A). Interestingly, to the contrary, on-treatment tumors with PR or SD expressed fewer neoantigens (two-sided t-test, BH p = 0.01; Supplementary  Fig. 6B), which may reflect increased aberrant expression in progressing patients, as the mutations are largely concordant across time points. As expected, TMB is correlated strongly with neoantigen load, before and after considering expression (r 2 = 0.9 and 0.7, respectively; Supplementary Fig. 6C, D).
While most HLA genotypes did not correlate with response, patients with the HLA-A-11-01 allele had worse PFS (nominal p = 0.013; Supplementary Fig. 6E). The significance of this finding is unknown and may warrant further study. Of note, a recent report in melanoma and NSCLC highlighted the role that HLA-I genotype plays in outcomes from checkpoint inhibition 28 . Although previous work has suggested that loss of heterozygosity (LOH) at HLA loci may contribute to response to immunotherapy 28,29 , we did not find that correlation. T-cell receptors (TCRs) on antigen-presenting cells have also been suggested to influence response to immunotherapy 30 . Intratumoral TCR clonality and diversity at baseline or on-treatment did not correlate with clinical outcome in our study (Supplementary Fig. 6F-M).

Discussion
There remains an unmet clinical need to optimize immunotherapeutic strategies for patients with metastatic sarcoma. We performed this clinical trial to evaluate the efficacy of bempegaldesleukin and nivolumab in predefined histological cohorts and obtained serial biopsy specimens to help provide insight into predictive biomarkers of response and resistance. We identified variations in ORR based on the cohort; there were no responses in osteosarcoma, conventional chondrosarcomas, and dedifferentiated liposarcomas, while 3 of 8 (38%) patients with non-cutaneous angiosarcoma and 2 of 10 (20%) patients with UPS obtained durable responses, thereby meeting predefined study endpoints and suggesting that additional study may be warranted. The clinical efficacy in non-cutaneous angiosarcoma is surprising and interesting; efficacy seen in UPS remains consistent with previously published studies [7][8][9] . The trial design emphasized the importance of histology-specific cohorts because of the heterogeneity of this disease; this heterogeneity also highlights the need for larger cohorts to better define the clinical  Cancer-associated fibroblast

On-Treatment Group
On efficacy of this therapeutic combination and validate the status of Hedgehog signaling status as a predictive biomarker. We sought to identify genomic and immune correlates of clinical benefit from this combination by conducting immunohistochemistry, whole exome sequencing, and RNA sequencing analyses. In our study cohorts, we did not identify any putative genetic drivers that correlated with response or resistance. Further, genomic markers such as TMB, neoantigen load, and loss of heterozygosity did not correlate with clinical benefit. Elevated TMB and resultant dominant UV signature mutations have been found to be potential predictive biomarkers of benefit in cutaneous, head, and neck angiosarcomas 31 . Despite the low TMB identified in our analyzed cohort, 3 of 8 (38%) patients with noncutaneous angiosarcoma derived prolonged clinical benefit with this drug combination, suggesting that conventional genomic markers alone may not suffice to identify suitable patients.
The presence of increased numbers of CD8 + T cells and higher PD-1 expression (by IHC and gene expression) at baseline and on-treatment were associated with improved ORR and prolonged PFS, consistent with previous reports in other solid tumors 32,33 . At baseline, our clustering analysis identified three groups of tumors (Group A: "immune-hot"; Group B: intermediate with higher neutrophils; Group C: "immune-cold"). Myeloid cells, including neutrophils, have been shown to suppress T cell responses, providing a potential explanation for the predominance of response in Group A versus Groups B and C 34 .
On-treatment analysis identified only 2 groups, A and C. Convergence into 2 groups presumably highlights the changes induced in the immune microenvironment during treatment. Notably, Petitprez et al. recently classified sarcomas into five phenotypes that correlated with ORR and OS 12 . While the clustering by Petitprez et al. was conducted in primary resected specimens, our cohort only includes treated metastatic specimens. Further, the variability of sarcoma subtypes and overall lower numbers within each subtype may also explain the differences between our results.
In sarcoma, bempegaldesleukin did not appear to improve efficacy to checkpoint blockade; other mechanisms may be driving resistance to immunotherapy. Our GSEA identified the upregulation of mesenchymal transition and Hedgehog signaling pathway expression as associated with lack of benefit. Hedgehog signaling has been implicated in the development of resistance to systemic therapies such as cytotoxic chemotherapy, targeted agents, and radiation therapy [35][36][37] . The Hedgehog signaling pathway can also regulate the immune response and aberrant Hedgehog signaling has been shown to drive tumor growth through a number of pathways including enhanced immunosuppressive mechanisms [38][39][40] . We have shown that the presence of CD8 + T cells and the reduced expression of the Hedgehog signaling pathway led to the best clinical outcome. Interestingly, in basal cell carcinoma, inhibition of Hedgehog signaling with vismodegib and sonidegib has led to reduction in tumor burden NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-022-30874-8 ARTICLE and durable clinical benefit for some patients 41,42 . In the setting of tumor regressions due to Hedgehog inhibitors, favorable changes to the immune microenvironment have been shown, including upregulation of MHC class I expression and infiltration of CD8 + T cells into tumors 43 . The interplay of Hedgehog signaling and the immune microenvironment warrants further investigation in sarcoma.
Limitations of this study include the low ORR across each cohort, which precluded robust correlative analyses. Further, the non-randomized nature limits the ability to identify the additive benefit of bempegaldesleukin. Sarcoma is a complex heterogeneous disease and each histological subtype can have variable behavior and response to systemic therapy, prompting the need to enroll patients onto cohorts based on subtype. Without welldefined biostatistical endpoints within each cohort, drawing definitive conclusions regarding efficacy is also not feasible. The goal instead was to identify a signal of efficacy that would potentially lead to expansions and further study. Similarly, our correlative studies identified interesting observations that will require additional evaluation in larger histology-specific cohorts.
This combination has limited activity in most sarcoma subtypes, other than in angiosarcoma where additional study may be warranted. Our data highlights the potential role of CD8 + T cell infiltration in driving immune responses to checkpoint inhibition, as has been described in other malignancies 32,33 . While upregulation of Hedgehog signaling seems to correlate with lack of efficacy, a predictive biomarker of immune conversion remains to be defined and further evaluation of the relevance of Hedgehog signaling status is necessary.

Methods
Study design and participants. This was a multi-center, open-label, non-randomized, non-comparative pilot trial conducted at MSK and MDA (Clinical-Trials.gov: NCT03282344). Eligible patients were ≥12 years and had advanced or metastatic sarcoma measurable per RECIST v1.1, ECOG performance status of 0 or 1, an estimated life expectancy of ≥3 months, previous receipt of ≥1 systemic therapy for metastatic disease (if applicable), and adequate kidney, liver, and bone marrow function. A sample size of 10 patients was planned for 7 of the histological cohorts (LMS, UPS/MFS, chondrosarcoma, DDLPS, osteosarcoma, angiosarcoma and ASPS), and a sample size of 15 patients was planned for the SBRCT/synovial sarcoma and other cohorts to account for their heterogeneity. A minimum of 28 days was required between any previous systemic therapy and initiation of bempegaldesleukin and nivolumab. Key exclusion criteria were active brain metastases or history of autoimmune diseases. Prior therapy with anti-PD-1 was permitted. The first participant enrolled on September 14, 2017, and the last enrolled on January 17, 2020.
Study protocol (Supplementary Note 1) was approved by review boards at MSK and MDA. The study was conducted according to the Declaration of Helsinki and the Guidelines for Good Clinical Practice. Each participant signed an IRBapproved, protocol-specific informed consent in accordance with federal and institutional guidelines. This study was monitored for accrual, safety, and the primary endpoint, at least twice annually, by the Data and Safety Monitoring Board, a standing committee at MSK. Since RECIST remains the standard approach to assess efficacy on therapeutic trials, in deviation from the protocol, we opted to not perform irRECIST reads as it would not change our interpretation of the data or alter our conclusions.
Treatment and assessments. Patients received bempegaldesleukin 0.006 mg/kg and nivolumab 360 mg (flat dose) as an intravenous infusion every 3 weeks. Treatment was continued until progressive disease (PD) or toxicity.
Tumor assessments were performed at baseline and every 8 weeks thereafter and were not subject to central radiology review. Patients experiencing PD were permitted to continue study treatment beyond initial PD if the patient had evidence of clinical benefit and tolerated study treatment. Dose reductions were not permitted; however, dose interruptions for up to 6 weeks were allowed. If treatment was discontinued due to adverse events, patients were followed until disease progression or initiation of different therapy for >30 days after the last dose of bempegaldesleukin and nivolumab. Laboratory tests were performed at baseline and every 2-3 weeks as per the treatment schedule. Adverse events were graded according to the National Cancer Institute Common Terminology Criteria for Adverse Events (NCI CTCAE) version 4.0 during treatment and ≥30 days after treatment discontinuation. Mandatory biopsies prior to treatment and during week 3 were obtained in the same anatomic site unless not medically feasible. In addition, research blood tests were obtained at baseline and pretreatment on day 1 of cycles 8,16,24,46, and at the time of progression.
Outcomes. The primary endpoint was confirmed ORR within each histology cohort based on RECIST v1.1 during protocol-directed treatment. Confirmation of response was required 4 weeks following initial response. Confirmed ORR was estimated as the number of patients with a CR or PR divided by the number of evaluable patients. Secondary outcomes were toxicity, clinical benefit rate, duration of response, PFS, OS, and duration of treatment. Clinical benefit rate was defined as best objective status (CR, PR, or SD) at a given time point while receiving protocol treatment, divided by the number of patients receiving treatment at the same time point. Duration of response was defined as the time from first CR or PR to date of PD. PFS was defined as the time from start of treatment to date of PD or death. Patients who discontinued treatment for reasons other than PD, such as adverse events, were censored at the date of their most recent disease evaluation prior to receiving any later systemic treatment regimens. OS was calculated as the time between start of treatment and date of death. Patients lost to follow-up were censored for survival at the date last known to be alive. Patients remaining in active treatment were censored for duration of treatment on their most recent date of treatment. All analyses excluded any data collected beyond the date of withdrawal for patients having withdrawn consent. After discontinuation of protocol treatment, survival status was assessed every 3 months for a year after the last dose.
Statistical analyses of clinical outcomes. Eligible patients who had initiated study treatment were considered evaluable for safety endpoints. Efficacy endpoints include only patients receiving ≥1 post-treatment assessment. In the cohorts with n = 10 patients, groups with two confirmed responses were considered worthy of further study, while in the cohorts with n = 15 patients (SBRCT/synovial sarcoma and other), groups with three confirmed responses were worthy of further study. Categorical data analyses and summary statistics were used to report adverse events. When patients were found ineligible after initiating study treatment, safety endpoints included only data prior to the date of ineligibility. The analysis of secondary endpoints included only eligible, treated patients. Kaplan-Meier methodology was used to estimate the distributions of all time to event endpoints. All patients included in the primary endpoint evaluation were also included in time-toevent endpoints. For all statistical estimates, 95% confidence intervals (CIs) were calculated. All statistical analyses were performed using statistical analysis system (SAS) R version 4.0.2.
Whole-exome sequencing. Frozen tissues were weighed and 20-30 mg homogenized in RLT buffer before nucleic acids were extracted using the AllPrep DNA/ RNA Mini Kit (QIAGEN catalog # 80204) according to the manufacturer's instructions. RNA was eluted in nuclease-free water and DNA in 0.5X Buffer EB. Viably frozen cells were thawed and pelleted and incubated for ≥30 min in 360 μL Buffer ATL + 40 μL proteinase K at 55°C. DNA was isolated with the DNeasy Blood & Tissue Kit (QIAGEN catalog # 69504) according to the manufacturer's protocol with 1 h of incubation at 55°C for digestion. DNA was eluted in 0.5X Buffer AE. After PicoGreen quantification and quality control by Agilent BioAnalyzer, 100-250 ng of DNA were used to prepare libraries using the KAPA Hyper Prep Kit (Kapa Biosystems KK8504) with 8 cycles of polymerase chain reaction (PCR). After sample barcoding, 100-500 ng of library were captured by hybridization using the xGen Exome Research Panel v1.0 (IDT) according to the manufacturer's protocol. PCR amplification of the post-capture libraries was carried out for 12 cycles. Samples were run on a HiSeq 4000 or HiSeq 2500 in Rapid mode in a PE100 run, using the HiSeq 3000/4000 SBS Kit or HiSeq Rapid SBS Kit v2 (Illumina). Normal and tumor samples were covered to an average of 100X and 204X, respectively.
The resulting fastqs were aligned and processed using TEMPO 44 (code for processing whole exome sequencing data can be found at https://github.com/ mskcc/tempo). Briefly, reads were aligned using Burroughs-Wheeler Aligner (BWA)-MEM 45 to the GRCh37 reference genome. Genome Analysis Toolkit (GATK) 46 best practices were used for base recalibration. High contamination of exome sequencing led to the removal of three samples from two patients from this analysis.
Somatic genome variants were called using the union of Mutect2 47 and Strelka2 48 . Variants were then filtered by several measures including variant allele frequency <0.05, tumor read depth of 20, tumor alternate read count of 3, and normal read depth of 10, along with filtering out repeated regions from RepeatMasker 49 and variants that appear at allele frequencies >0.01 in GNOMAD 50 . Variants were considered shared between samples of the same patient if there was ≥1 supporting read in the other sample from the same patient. Variants were then annotated using OncoKB 51 for oncogenicity. Variants classified as "predicted/likely/oncogenic" in OncoKB or that cause truncation of tumor suppressor genes by OncoKB were considered drivers. Truncating mutations were defined as those that create a stop codon or were an out-of-frame indel. Clonality differences between baseline and on-treatment samples were calculated by estimating the uncertainty of the baseline variant allele frequency using the Wilson 95% CI on the binomial probability. If the on-treatment variant allele frequency was not within the 95% CI, it was considered discordant. TMB was calculated by TEMPO as the number of non-synonymous mutations in coding regions covered by the IDT baits. TMB was considered at baseline in analyses unless otherwise specified.
Somatic copy number alterations were called using FACETS (Fraction and Allele-Specific Copy Number Estimates from Tumor Sequencing) v0.5.14 52 . Each tumor and matched normal pair was processed in a two-pass manner: an initial run for purity and ploidy estimation followed by a second run for focal event detection. Each fit was manually reviewed to minimize false positives and to evaluate the quality of the fit, leading to eight samples from seven patients being removed from the analysis. FGA and whole-genome doubling were calculated from the resulting FACETS using facets-suite V2 53 . Baseline FGA was used in analyses unless otherwise specified. In the oncoprint, amplifications in tumor suppressor genes and homozygous deletions in oncogenes were removed as likely passenger events.
HLA typing was performed using PolySolver 54 . Neoantigens were identified using NetMHC-pan 55 . They were considered expressed if ≥1 RNA-seq read was identified to support the mutated base. Correlation between TMB and number of neoantigens was performed in R using "cor.test".
RNA sequencing. After RiboGreen quantification and quality control by Agilent BioAnalyzer, 107-500 ng of total RNA with RIN values of 5.4-10 underwent polyA selection and TruSeq library preparation according to instructions provided by Illumina (TruSeq Stranded mRNA LT Kit, catalog # RS-122-2102), with 8 cycles of PCR. Samples were barcoded and run on a HiSeq 4000 in a PE100 run, using the HiSeq 3000/4000 SBS Kit (Illumina). An average of 43 million paired reads was generated per sample. Ribosomal reads represented 1.9-19% of the total reads generated and the percent of mRNA bases averaged 69%.
The resulting fastqs were processed using an in-house RNA sequencing pipeline. Briefly, fastqs were aligned using STAR 2.7.0 56 to Ensembl v75 57,58 . The resulting bams were used for quality control by Picard 59 and to identify fusions using the intersection of calls from FusionCatcher 60 and Arriba 61 . Expression was quantified using Kallisto 62 and summarized at the gene level using Enembl v75.
Normalized transcripts per million (TPM) were calculated using Sleuth (sleuth_to_martix). Immune populations were quantified using the R package "immunedeconv" 63 . Within this package, we used MCPcounter 20 , from which Z-scores were calculated from the quantification of each population. Correlation between IHC values and those derived from MCPcounter was performed in R using "cor.test". These scores were used for hierarchical clustering of samples at each time point using Manhattan distance and Wald D clustering to classify the overall immune infiltration into each tumor. Differences in PFS were evaluated by log-rank test.
First model to identify differentially expressed genes in Sleuth: 64 Expression TrialCohort þ SampleTimepoint þ PatientID þ Purity þ PartialResponder Where TrialCohort is sarcoma subtype, Sample Timepoint is either baseline, on treatment, or progression; Patient ID allows for multiple samples from the same patient; and PartialResponder is yes or no according to whether the patient met criteria for partial response. Purity was derived from FACETS (in exome data) except for four samples that did not have an acceptable FACETS fit and for which maximum variant allele frequency was used instead. Two samples did not have corresponding exome data and so were removed from this differential expression analysis. This model was chosen after considering several covariates, including cohort, sample time point, patient, purity, age at study entry, sex, number of prior treatments, relative amount of cancer fibroblasts (calculated from MCPcounter) and batch. To explain partial response, we used stepwise Akaike Information Criterion (AIC) to identify the optimal covariates 65 ; these included batch, cohort and patient. Batch and cohort could not be modeled together because they produced a computationally singular system. We considered the linear model defined by stepwise AIC and found that while cohort and patient contributed significant effects on modeling partial response (p < 0.05), batch did not (p > 0.3).To gain further confidence in our results, we performed a linear model using "lm" in R. We modeled the log 2 of the normalized TPM that was used for the immune deconvolution above. We limited the genes considered to those that passed sleuth filtering, which filters genes that do not have at least five estimated counts in at least 47% of samples. To compare partial responders and non-partial responders, we used the model Expression PartialResponder þ TrialCohort þ SampleTimepoint þ PatientID þ Purity þ TrialBatch þ AgeAtStudyEntry þ Sex þ PriorTreatments þ CancerFibroblasts We used the rank of genes from this model to perform GSEA and confirm results from the previously discussed model that resulted from sleuth filtering ( Supplementary Fig. 3B).
A Wald test was used to identify differentially expressed genes between samples from partial responders and those from other patients. A gene was considered differentially expressed if q < 0.05. We utilized a similar model to compare samples with progressive disease and those with clinical benefit (including samples with partial response or stable disease). In this model, we replace "PartialResponder" with "ProgressiveDisease," such that genes with a positive effect size are more highly expressed in samples with progressive disease.
Pathway enrichment was performed on Hallmark pathways 66 pulled using R package msigdb (version 7.2.1) 65 . GSEA was performed using the R package fgsea 21 on genes ranked by effect size (beta) from the above model. Minimum size was set to 15 and maximum was set to 600, and 10,000 permutations were performed. Figure 3B shows the top 10 significant pathways with a positive or negative normalized enrichment score. The R package GSVA 67 was used to calculate ssGSEA values for each sample to cluster samples based on expression of enriched pathways. Heatmap shows ssGSEA calculated on only genes in the leading edge for each pathway, as found by fgsea analysis.
The model best associated with PFS was found using cv.glmnet from glmnet R package (version 4.1) 68 . Immune population quantification via MCPcounter and ssGSEA scores shown in Fig. 3C were combined in a Cox proportional hazards model to predict PFS for baseline and on-treatment samples, separately. The minimum lambda was used. Figure 3D, E were created using the same values for CD8 + cells and Hedgehog signaling as used in the model, but the high vs low samples were identified within each time point.
TCR sequencing. After RiboGreen quantification and quality control by Agilent BioAnalyzer, 11-259 ng of total RNA were prepared using the Immunoverse TCR-HS α/δ/β/γ Kit, for Illumina (ArcherDX catalog # DB0219) according to the manufacturer's instructions. Briefly, cDNA was synthesized using TCR-specific priming for reverse transcription. Molecular barcode adapters were ligated to cDNA fragments and multiplex PCR with primers targeting the CDR3 sequence of interest was used for enrichment and library preparation. Barcoded samples were pooled equimolar and sequenced on a MiSeq, NextSeq 500, or NovaSeq 6000 in a PE150 run, using the MiSeq Reagent Kit v3 (300 cycles), NextSeq 500/550 Mid Output Kit v2.5 (300 cycles), or NovaSeq 6000 S3 Reagent Kit (300 Cycles) (Illumina). Each sample yielded on average 3.2 M reads and fastq files were uploaded to the Archer Analysis bioinformatics suite for processing.
Archer delivered trimmed and deduplicated fastqs, which were then used to quantify TCR diversity for the 4 TCRs analyzed by the platform (α, β, γ, and δ). The MiXCR pipeline was used to process, align, assemble, and export clones for each TCR type separately 69 . Next, the VDJtools pipeline 70 was utilized to calculate basic statistics including TCR diversity. Differences in TCR diversity were compared by two-sided t-test between each response group (PR, SD, or PD) vs. others.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
De-identified exome sequencing data from patients treated at Memorial Sloan Kettering Cancer Center have been deposited in the NCBI dbGaP archive under accession number phs001783. De-identified exome sequencing data from patients treated at MD Anderson Cancer Center and all de-identified RNA sequencing and TCR sequencing data have been deposited in the NCBI dbGaP archive under accession number phs002852. The data are available under controlled access, which can be obtained through dbGaP upon request. The raw sequencing data from 3 patients are not deposited in dbGaP because they did not consent to future use. De-identified individual participant-level clinical data will be made available upon request to the corresponding author. Mapping of each sample, along with IHC values, can be found in Supplementary Data 1. The study protocol is available as Supplementary Note 1 in the Supplementary Information file. The source data are provided in the Source Data file and can also be found at https://github. com/mskcc/ImmunoSarc. The remaining data are available within the Article and Supplementary Information. Source data are provided with this paper.

Code availability
Code for processing RNA-seq data and for subsequent analysis throughout the manuscript can be found at https://github.com/mskcc/ImmunoSarc.