T cell subtype profiling measures exhaustion and predicts anti-PD-1 response

Anti-PD-1 therapy can provide long, durable benefit to a fraction of patients. The on-label PD-L1 test, however, does not accurately predict response. To build a better biomarker, we created a method called T Cell Subtype Profiling (TCSP) that characterizes the abundance of T cell subtypes (TCSs) in FFPE specimens using five RNA models. These TCS RNA models are created using functional methods, and robustly discriminate between naïve, activated, exhausted, effector memory, and central memory TCSs, without the reliance on non-specific, classical markers. TCSP is analytically valid and corroborates associations between TCSs and clinical outcomes. Multianalyte biomarkers based on TCS estimates predicted response to anti-PD-1 therapy in three different cancers and outperformed the indicated PD-L1 test, as well as Tumor Mutational Burden. Given the utility of TCSP, we investigated the abundance of TCSs in TCGA cancers and created a portal to enable researchers to discover other TCSP-based biomarkers.

Anti-PD-1 therapies are an increasingly important treatment option across many cancer types 1,2 . Anti-PD-1 therapy and other checkpoint inhibitors are approved for the treatment of 14 cancer types, making around 39% 3 of all cancer patients eligible for checkpoint therapy. Unfortunately, only approximately 11% 3 of all cancer patients benefit from anti-PD-1 therapies. Yet, among patients who respond to anti-PD-1 therapy, many experience a robust, durable response even in cancers with historically poor long-term survival 4,5 . In an effort to unlock these improved outcomes and more cost-efficient treatments, researchers have investigated various biomarkers 6 . The most commonly used biomarker for anti-PD-1 therapies is PD-L1 expression measured by IHC. PD-L1 is the ligand for the PD-1 receptor and a target of checkpoint inhibitors in its own right. Unfortunately, PD-L1 is an unreliable biomarker for predicting response 7 .
Although the PD-L1 molecule is involved in the mechanism of action of anti-PD-1 therapies, other aspects of adaptive immunity, namely T cells, may be more useful in predicting response 8 . T cells can be classified according to their activation and differentiation status, which capture the activity, antigen-exposure, and specific functional role of a T cell population. In particular, five T cell subtypes (TCSs)-naïve 9 , activated 10 , effector memory (EM) 11 , central memory (CM) 11 , and exhausted 12 -are descriptive of the immunogenic status of T cell adaptive immune response and, when measured in isolation, have offered insights into response to anti-PD-1 therapy. For instance, in Head and Neck Squamous Cell Carcinoma (HNSCC), higher levels of EM T cells are associated with response 13 , while exhausted T cells are associated with poor prognosis 13,14 . PD-1 inhibits effector function upon ligand binding and is heavily expressed in exhausted T cells 12 , and has been associated with response in Non-Small Cell Lung Cancer (NSCLC) 15,16 . In addition, exhausted T Cells are an important component of the anti-tumor immune response following PD-1 or CTLA-4 blockade in several cancers [17][18][19][20][21][22][23] . Furthermore, higher levels of CM T cells have been associated with anti-PD-1 response in melanoma 24 . These works suggest that a more nuanced and comprehensive characterization of TCSs might more successfully predict anti-PD-1 response.
With this goal, we developed a biomarker platform that estimates the prevalence of these five TCSs using novel RNA models, called Subtype Health Expression Models (sHEMs). This novel approach has several advantages: 1) Estimating TCSs using sHEMs is easy to perform on commonly available clinical samples. Traditional methods of measuring TCSs, especially the activated and exhausted subtypes, require functional assays and/ or flow cytometry and are difficult or impossible to perform routinely on common clinical specimens. Similar to Immune Health Expression Models 25 , sHEMs eliminate this limitation and allow characterization of TCSs

Results
Subtype health expression models discriminate T cell subtypes. We created sHEMs as a tool to estimate the abundance of TCSs in newly sequenced FFPE samples or in public datasets. To create them, we isolated cells that define each TCS (Fig. 1A), including naïve, effector memory (EM), and central memory (CM) CD8+ T cells, from healthy PBMC donors using flow cytometry. Activated and exhausted T cells were generated in vitro via continuous CD3/CD28/CD2 stimulation of isolated naïve CD8+ T cells. Activated T Cells corresponded to early stimulation and had maximal proliferative capacity according to IL2 and IFNγ expression. In contrast, exhausted T cells corresponded to late chronic stimulation having impaired cytokine production, little to no proliferation, and high expression of PD-1, TIM3, and LAG3 inhibitory receptors, in agreement with observations in literature 12 .
Total RNA isolated from these five types of cell isolates were respectively processed and sequenced. Differentially expressed genes were chosen to identify the five TCSs (Fig. 1A). sHEMs for each of the five TCSs were created using the mean value for each of the differentially expressed genes for the five respective isolates. This data-driven approach has an advantage of being unbiased, which is especially important given the overlap between classical effector and exhaustion genes. These initial models adequately estimated TCS abundance in simple, PBMC based samples, but suffered from high false positive estimates in more heterogeneous tumor samples. To improve performance in complex samples, genes with relatively high expression in the cell lines of the Cancer Cell Line Encyclopedia (CCLE) 40 were filtered out (Fig. 1A). The resulting five sHEMs were composed each of 46 genes (Fig. 1B).
The five sHEMs differentiate the five TCSs in heterogeneous FFPE tumor samples. Using these models and gene expression data from a sample, an unknown FFPE tumor sample can be characterized by solving a linear equation. The estimated abundances of each of the five TCSs represent what ratio of RNA in a whole sample is comprised of each TCS (Fig. 1A) and are reported as a number between 0 and 100. Often, for example when comparing against other measurement modalities or previous work, it is most useful to consider TCS estimates normalized to their sum. In this case, axis labels are reported as "TCS Estimate " and are reported as a number between 0 and 1. We refer to the process of characterizing a sample in regard to TCSs as T Cell Subtype Profiling (TCSP). TCSP characterizes the immune response in a tumor and, as shown later in this work, can predict patient response to anti-PD-1 therapy. sHEMs were created using whole exome sequencing data, enabling TCSP to characterize the infiltrating immune response within and across many cancer types using publicly available RNA-seq data.
We show the normalized expression of each gene across all five sHEMs (Fig. 1B, Supplementary Fig. 1). Per the Reactome database 41 , effectively all of these genes are involved in immunity related pathways ( Supplementary  Fig. 2). Each model had a few notable, constituent genes which characterize each TCS. The naïve sHEM displayed high expression of LEF1, a gene involved in T cell development and peripheral T cell differentiation 42 . A set of genes involved in homeostasis (NR4A2 43 ) and quiescence (CD248 44 and DUSP8 45 ) were also highly expressed in the naïve model versus others. The activated sHEM showed high expression of cytokines related to inflammation and proliferation including EBI3 46 , IL2 47 , and IL23A 48 . The transcription factor TBX21 (T-Bet), which is involved in the regulation of development and CD4+ T Cell differentiation 49 , was also highly expressed. The inhibitory receptor, LAG3 (HAVCR2), negatively regulates T cell activation 50,51 and was also a constituent gene www.nature.com/scientificreports/ of the activated sHEM. The exhausted sHEM had high expression of CSF2, a gene associated with prolonged stimulation and cell aging 52 . This model also had the highest expression of genes that prohibit differentiation (ASB2 53 ), cell growth (CSF2RB 52 ), and inflammation (CCR2 54 ). The EM sHEM had high expression of KLRD1 (CD94), which may regulate effector functions and cell survival of CD8+ T cells 55 . Additionally, MAF, a regulator of differentiation and function in a wide variety of T cells 56 , and CCR5, a gene involved in chemokine-induced costimulation 57 , were highly expressed. The CM sHEM had high expression of LY9, a gene that negatively regulates the development of memory CD8+ T cells 58 . In addition, genes associated with trafficking (GCNT4 59 ) and adhesion (VSIG1) were also more highly expressed in the CM model. Interestingly, some canonical genes associated with the five TCSs in literature are missing from the sHEMs due to our data-driven approach. These genes are not differentially expressed between the five TCSs nor between the five TCSs and the tumor microenvironment (via the CCLE database) and therefore aren't useful for estimating abundances in the tumor microenvironment. For example, the exhausted sHEM does not include other inhibitory receptors such as PD-1 (PDCD1) and TIM3 (HAVCR2) 12 because these genes are also highly expressed in the activated subtype. Genes such as TCF7, TOX, EOMES, and CD39 (ENTPD1) 12 were also not discriminative (Supplementary Fig. 1 and 3).
T cell subtype profiling is analytically robust. The five sHEMs were developed with the goal to characterize the immune response in heterogeneous specimens and predict response to anti-PD-1 therapy. Therefore, it is imperative that TCSP is accurate and analytically robust. This section focuses on the analytical experimentation done to validate the five sHEMs and specifically, the abundances estimated by the TCSP technique.  www.nature.com/scientificreports/ First, we demonstrate the performance of estimating the abundance of the naïve, activated, and exhausted TCSs. Naïve CD8+ cells from a donor withheld from creating the models were chronically stimulated in vitro for 14 days. At days 4 and 6, there is a peak abundance of extracellular IL2 and IFNγ cytokines, respectively ( Fig. 2A). Meanwhile, the abundance of cells triple-positive for PD-1, TIM3, and LAG3 receptors grow through day 4 and peak at day 10, with sustained abundance into days 12 and 14 ( Fig. 2A, Supplementary Fig. 4B). In accordance with previous studies 12, [60][61][62] , there is also: a progressive increase in expression of TIGIT, 2B4, CD39 (ENTPD1), and TOX; a progressive decrease in expression of LAG3 and GZMB after peak activation; and a peak of EOMES expression early and at the end of chronic stimulation ( Supplementary Fig. 4). This progressive increase in several inhibitory receptors, coupled with progressive loss of proliferative and cytotoxic expression and a later stabilization or decrease of inhibitory receptors is a hallmark of T cell exhaustion 12 . These readouts suggest that in this chronic stimulation experiment the cells start out as naïve, became activated by day 4 and are exhausted by days 12 and 14. We compared these orthogonal measurements to TCS estimates. Our characterization mirrored this trend as the cells on day 0 are estimated to be naïve, day 4 is chiefly characterized to be activated, days 6 through 10 are characterized as a progressive transition from a population of activated to exhausted, while days 12 and 14 are estimated to be exhausted ( Fig. 2A). These trends are preserved regardless of which donors are used for creating the sHEMs and which withheld donor is used for evaluation ( Supplementary Fig. 5). The xCell method observes a state switch on Day 2 from Naïve, but lacks any ability for determining the activation www.nature.com/scientificreports/ or exhaustion of the CD8+ cells, instead suggesting that late (days 10-14) stimulated cells are enriched for EM ( Supplementary Fig. 6A). Increased inhibitory receptor levels alone are not sufficient to specifically measure the exhaustion of T cells ( Fig. 2A). Rather, it is the measurement of secreted cytokine levels after stimulation coupled with inhibitory receptor expression that enables one to approximate exhaustion in of a population of T cells. Performing this sort of multi-faceted analysis is not only laborious, but also infeasible in FFPE tumor samples. Our exhausted sHEM addresses these challenges and thus provides a powerful tool in profiling dysfunction of the immune response in a preserved tumor. We next evaluated the performance of TCSP using PBMCs. For a single donor, live T cells were sorted for naïve, EM, and CM CD8+ cells via flow cytometry. We did not sort for activated or exhausted T cells, which are typically rare in healthy patient PBMCs and are difficult to accurately characterize with flow markers alone. Samples were profiled and normalized to the total estimated abundance of all five TCSs. TCSP correctly characterized the naïve and EM isolates as predominately being naïve and EM subtypes, respectively. The CM isolate was estimated to be ~ 80% the CM subtype, but also estimated some fraction of the isolate to be naïve and EM subtypes (Fig. 2B). We also profiled the CD4+ and CD8+ isolates of PBMCs from eight donors. The CD8+ isolates had a mean estimate of 37% naive, 0% activated, 4% exhausted, 53% EM, and 5% CM, while the CD4+ isolates had a mean estimate of 26% naïve, 0% activated, 4% exhausted, 30% EM, and 40% CM. The estimated abundances of the naïve, EM, and CM TCSs in CD4+ and CD8+ T cells reflect those reported in other healthy donors 63 . The CD4+ and CD8+ samples are estimated to have a low level of exhaustion, perhaps due to latent viral infections, for example from Epstein-Barr Virus, where up to 2.5% of CD8+ T cells are specific to EBV in healthy individuals 62,64 . In general, the xCell method struggles with these T cell isolates, exhibiting false positives of high CD4+ cell enrichment across all isolates and an inability to differentiate between EM and CM isolates ( Supplementary Fig. 6B). These results suggest that TCSP can accurately estimate TCSs across both CD4+ and CD8+ T cells and is superior to the xCell T cell enrichment method.
TCSP of the tumor microenvironment is challenging because immune cells are integrated in and affected by a heterogenous mix of tumor and stroma. Therefore, we next aimed to validate performance with various isolates of dissociated tumor cells from single donors of lung cancer, melanoma, and ovarian cancer. CD45− isolates are devoid of immune cells and were sorted to establish the specificity of TCS estimates. The average estimates across the three tissue types are < 0.25 (out of 100 parts) for the activated and CM subtypes, and effectively 0 for the exhausted subtype (Fig. 2C). On average, the naïve and EM subtypes suffered from higher false positive estimation, although still very low and to different degrees depending on the cancer type (Fig. 2C). We further sought to explore the sensitivity of profiling the exhausted subtype by titrating RNA-seq data from an exhausted sample (from day 12 of the chronic stimulation) into the CD45− sample, in silico. In fractions of 1% to 100% exhausted reads, we see a reliable estimate when independently titrating in the 3 different cancer types (Fig. 2D). In these titrations, the level of the other four TCSs are at or near 0 ( Supplementary Fig. 7) as expected. Finally, we measured the unsorted lung, melanoma, and ovarian samples, which consisted of a mix of immune, stromal, and cancer cells. Estimates of these three samples correlate with flow cytometry measurements for EM, CM, and activated TCSs (Fig. 2E). The naïve subtype was estimated to be more abundant than what was measured by flow cytometry, which may be a result of reduced specificity for this TCS (Fig. 2C). As described before, it is not possible to functionally characterize exhaustion level with flow cytometry alone, thus we were not able to evaluate our exhaustion estimates in these 3 samples. However, when comparing CD45+ isolates to the unsorted samples, the rank order of exhausted subtype estimates is preserved among all TCSs ( Supplementary Fig. 8). In all, these results build confidence in TCSP of infiltrating T cells in heterogeneous tumor samples.
T cell subtype profiling is consistent with external observations. TCSP is robust in estimating the abundance of TCSs and can be used to characterize public datasets, making TCSP unique in its ability to investigate many biological and clinical questions across specimen types and datasets. In addition, the breadth of our functionally validated sHEMs enables a detailed, yet comprehensive approach to investigate TCSs in the context of chronic infection and cancer. As such, we next sought to validate our approach by corroborating external observations in literature.
We performed TCSP on a set of previously characterized CD39+ and CD39− sorted cell isolates from Non-small Cell Lung Cancer (NSCLC) and Colorectal Cancer (CRC) 65 . CD39+ tumor infiltrating T cells have been associated with both exhausted and effector memory phenotypes 62,65-67 . As observed previously 66 , both CD39+ and CD39− T cell isolates were estimated to be primarily comprised of the EM subtype (Fig. 3A). In addition, we confirmed that the CD39+ isolate exhibited a higher relative level of exhausted subtype than CD39− T cells (Fig. 3B). These two trends held in both subsets when NSCLC and CRC samples were analyzed individually (Supplementary Fig. 9A and B).
Tumor infiltrating T cells that express a high level of PD-1 have also been associated with exhaustion 12,16,68 . A previously characterized set of T cell isolates from blood and NSCLC tumors 16 were profiled for TCSs.
Exhausted subtype estimates increased as PD-1 expression increased in isolates, corroborating previous observations (Fig. 3D). EM cells isolated from blood had the lowest exhausted estimates, while PD-1 high isolates from tumors had the highest. Similarly, activated subtype estimates were positively correlated with PD-1 expression. Conversely, naïve and EM subtype estimates were negatively correlated with PD-1 expression (Fig. 3C,  Supplementary Fig. 9D). This suggests that expression of PD-1 in CD8+ infiltrates, along with expression of other classical inhibitory receptors ( Supplementary Fig. 9C), correlates with a population of cells that are increasingly antigen experienced with a decreasing effector function.
T cell exhaustion and dysfunction may be caused by a variety of factors 12,69 , but are typically associated with persistent, chronic antigen stimulation. This model of exhaustion has its origins in viral research but has also been demonstrated in solid tumors 12 Fig. 10B). In Liver Hepatocellular Cancer (LIHC) 80 , exhaustion trends were mixed. Exhaustion was estimated to be higher in tumors with active HCV infections, but not HBV infections (Fig. 3E). TCS profiles were similar across both malignant and non-malignant tissue and regardless of viral status (Supplementary Fig. 10C and D). Similarly, observations in literature indicate few differences in T cell exhaustion, T cell type, and T cell abundance when comparing viral status in this LIHC dataset 78,80 . Yet, in agreement with our estimates, alternative work has found HCV specific T cells to be highly exhausted 62 . These data suggest that exhausted T cells are elevated in at least some tumor types during concurrent viral infection, but may be dependent on the type of virus. Regardless, TCSP is able to measure heightened levels of exhaustion caused by both viral and tumor induced exhaustion.
T cell subtype profiling predicts anti-PD-1 response. TCSP is robust in measuring biologically relevant physiology. In addition, T cell biology is heavily implicated in patient response to immunotherapies, especially in HNSCC 13,14 , NSCLC 16,[81][82][83][84] , and Melanoma 20,85 . Therefore, we used TCSP to study and retrospectively predict anti-PD-1 therapy response in these three cancer types. First, we examined response in recurrent and metastatic HNSCC. This cohort of 85 samples consists of nonnasopharyngeal samples collected from patients at Washington University School of Medicine and processed at Cofactor Genomics. With TCSP, we found that the EM T cells were more abundant in tumors of responders and that overall T cell infiltrate was higher in responders, in line with other work 13,14 (Fig. 4A, Supplementary  Fig. 11A). Using machine learning, we built a multianalyte biomarker to predict response to anti-PD-1 treatment in this indication. We used bootstrap sampling to best approximate future performance in an independent validation set. Notably, this biomarker (AUC = 0.71) better predicted objective response relative to PD-L1 IHC testing (AUC = 0.62), which is an indicated companion diagnostic in this cohort (Fig. 4B). The TCSP based biomarker also predicted overall survival outcomes, with predicted responders having longer survival (Fig. 4C).
Next, we considered a cohort of recurrent and metastatic NSCLC patients with primary tumors that were treated with anti-PD-1 therapies in early treatment lines 86 . We investigated the differences between 21 patients with durable clinical benefit (DCB) and non-durable benefit (NDB). DCB was defined as complete response (CR), partial response (PR), or stable disease (SD) as defined by RECIST 1.1 for at least 6 months. Similar to HNSCC, we observed a greater proportion of exhausted and EM T cells in NDB and DCB samples, respectively (Fig. 4D, Supplementary Fig. 11B).
These sum-normalized exhausted and EM T cell observations measured from FFPE tissue in two different cancers are reminiscent of the characteristics observed in tumor-isolated T Cells gated on CD39 and PD-1 (Fig. 3A,C) and suggest that CD39+ and/or PD-1+ T cell populations may be higher in responders in both cancers. This corroborates the previously observed association of PD-1+ T cells and response in NSCLC 16 . In this NSCLC cohort, we also observed that estimates of exhaustion were higher in NDB patients, consistent with this previous work (Supplementary Fig. 11B). In contrast with other previous work [81][82][83][84] , however, we found that total infiltrate levels were higher in this population of NDB patients (Supplementary Fig. 11B). Similar to HNSCC, we used the TCSP readouts as inputs to train a multianalyte biomarker and evaluate the performance in predicting DCB in NSCLC. The TCSP biomarker better predicted DCB (AUC = 0.78) compared to both the indicated companion diagnostic, PD-L1 IHC (AUC = 0.73), and also Tumor Mutational Burden (AUC = 0.71)   www.nature.com/scientificreports/ (Fig. 4E). In addition, patients predicted to have DCB by the TCSP-based biomarker had significantly longer overall survival (Fig. 4F). Next, we explored an existing public dataset of advanced Melanoma patients treated with Nivolumab 87 . We investigated the TCSs in 31 on-treatment tumors. Patients who responded to Nivolumab were found to have higher levels of exhausted, EM, and total T cell infiltrate (Fig. 4G), echoing observations in HNSCC (Supplementary Fig. 11A). However, no trends were observed when considering sum-normalized readouts ( Supplementary  Fig. 11C). These non-normalized observations agreed with some work 85 , while normalized observations failed to corroborate other work 20 , perhaps due to differences in methods of measuring T cell abundance, i.e. IHC vs single cell RNA-seq. In line with the above HNSCC and NSCLC experimentation, we built a third multianalyte biomarker to predict response to Nivolumab. This biomarker also predicted objective response (AUC = 0.69) and overall survival in this third indication (Fig. 4H,I).
Last, we investigated immune gene sets related to inflammation 36 , cytotoxicity 37 , IFNγ 38,39 , antigen presentation 39 , and exhaustion 39 . These gene sets have been associated with response and survival for patients undergoing anti-PD1 therapy in several tumor types including the ones investigate in this work: HNSCC, NSCLC, and melanoma. Using the same multi-analyte and cross validation approach, we evaluated biomarkers using these gene sets in these three cohorts, as compared to TCSP. TCSP performed the best in NSCLC and HNSCC cohorts, only being surpassed in performance by antigen presentation genes in NSCLC ( Supplementary Fig. 12). TCSP and antigen presentation genes had the highest average AUC performances across all three cohorts: 0.72 and 0.73, respectively ( Supplementary Fig. 12).
Although varying across difference cancers, the TCSP of tumors has shown early promise in predicting clinical outcomes to treatment with anti-PD-1 therapies. Further development considering other additional analytes, especially antigen presentation gene expression, may further improve performance. In addition, additional independent samples are required to validate the biomarkers for clinical use. With its unique ability to characterize FFPE samples, TCSP can facilitate previously impossible translational research. To aid other researchers in characterizing the TCS of their cohorts and discovering other TCSP-based biomarkers in oncology, we have made TCSP available at tcsp.cofactorgenomics.com.
T cell subtype profiling of many cancers. Given the performance of TCSP-based biomarkers in HNSCC, NSCLC, and melanoma, we leveraged TCGA data to expand our investigation to 32 additional indications. We identified additional tumor types in which response to anti-PD-1 might be predicted by searching for tumor types with similar characteristics to HNSCC, NSCLC, and melanoma. The HNSCC, NSCLC, and melanoma cohorts presented in the previous section had a high ratio of exhausted to EM cells (Fig. 4). As expected, in the TCGA data, Head and Neck Squamous Cell Cancer (HNSC), Lung Squamous Cell Cancer (LUSC), and Skin Cutaneous Melanoma (SKCM) were among the eight highest exhausted to EM T cell ratios (Fig. 5). Other indications with a high exhausted to EM ratio include Large B-cell Lymphoma (DLBC), Uterine Carcinosarcoma (UCS), and Stomach adenocarcinoma (STAD). HNSCC and LUSC are the two highest in the ratio of activated to EM, followed by Pancreatic Adenocarcinoma (PAAD), Bladder Urothelial Carcinoma (BLCA), and Ovarian www.nature.com/scientificreports/ Serous Cystadenocarincoma (OV) (Fig. 5). These additional tumor types are potential candidates for TCSPbased biomarkers to predict anti-PD-1 response. We also investigated other general immunological trends across tumor types in TCGA. When considering the abundance of many TCSs, the inter-disease variance was as large as intra-disease variance ( Supplementary  Fig. 13). Several observations fit expectations. Thymoma (THYM) and DLBC had the highest total T cell infiltrate (Fig. 5), while Thyroid Carcinoma (THCA) had the highest presence of the naïve T cells ( Supplementary  Fig. 13). Almost all cancers lacked CM T cells, except THYM, which had the highest abundance ( Supplementary  Fig. 13). Other observations may provide new insights. DLBC, Cholangiocarcinoma (CHOL), and UCS were the three highest exhausted diseases (Supplementary Fig. 14). OV had the highest levels of activated T Cells, while Glioblastoma Multiforme (GBM) had the highest abundance of EM T Cells (Supplementary Fig. 13). The TCSP of TCGA cancers discussed here are available for download at the TCSP portal.

Discussion
The TCSP method described in this paper is a novel way to characterize T cells. Characterizing the five TCSs in FFPE patient samples enables new opportunities for researching the tumor-immune microenvironment, studying response to immunotherapies, and developing biomarkers to predict patient response to treatment. sHEMs were designed to be specific to each TCS, allowing one to discriminate between TCSs in heterogeneous FFPE tumor samples rather than relying on commonly used non-specific markers (e.g., PD-1 as a marker for exhaustion).
Indeed, we found that, with both in vitro and in-patient samples, many traditional markers for T cell exhaustion are correlated with, but not specific to exhaustion. For instance, gene expression of the inhibitory receptors PD-1, TIM3, and LAG3 is increased in activated T cells as well as exhausted T cells during chronic stimulation ( Fig. 2A; 49 ). In NSCLC patients, PD-1+ CD8+ cells are associated with exhaustion ( Fig. 3D; 12,16 ), however, these PD-1+ cell isolates are also associated with increased activation (Supplementary Fig. 9D). Likewise, in NSCLC and CRC patients, CD39+ T cells are associated with both exhausted and EM T cells 62,[65][66][67] . While TCSP corroborated these findings, we also found that CD39+ T cells are associated with higher activation, suggesting that CD39 is not a specific marker for exhaustion nor EM T cells (Fig. 3A,B, Supplementary Figs. 8A-B). These findings, paired with observations in literature, suggest a more complex interaction between single-analyte markers and TCSs, and demonstrate that our TCSP method is a more specific way to characterize infiltrating T cells, especially those that are exhausted.
The use of single-analyte surrogates for complex TCSs is likely driven by the difficulty of comprehensively characterizing TCSs, particularly in FFPE tissue. Typically, accurately estimating TCSs requires flow cytometry and/or functional tests using unpreserved tissue. The TCSP method presented here is the first platform for comprehensive and specific profiling of TCSs in FFPE samples, whether in new or existing RNA-seq datasets.
Importantly, TCSP is not only useful for characterization, but also for developing multianalyte biomarkers to predict patient response to immunotherapy. In HNSCC and NSCLC patients, TCSP-based biomarkers outperformed existing biomarkers, namely the companion diagnostic PD-L1 IHC (Fig. 4A-F). Similarly, a TCSP-based biomarker predicted objective response and overall survival in a public melanoma patient cohort (Fig. 4G-I). Additional development and independent cohorts are needed to validate these biomarkers, but this work demonstrates the potential of TCSP for building biomarkers across multiple cancer types. Supporting this idea, we characterized TCSs across 32 cancer types and found potential indications to pursue biomarker development.
Given the utility of TCSP across multiple cancer types, we have created an online portal where other researchers can leverage this technology. At tcsp.cofactorgenomics.com, researchers can perform TCSP on their own samples by uploading RNA-seq counts. Optionally, researchers can upload sample grouping information to compare sets of samples or explore potential TCSP-based biomarkers in their indication of interest. At this portal, researchers can also download the TCSP of TCGA samples for further inquiry into specific diseases. We hope that this TCSP portal may help others discover biomarkers predictive of anti-PD-1 and other therapies.

Isolation of T cell subsets by flow cytometry. Naïve T cells, effector memory T cells, and central mem-
ory T cells were isolated by FACS sorting. Cryopreserved human peripheral blood mononuclear cells (PBMCs) from normal healthy donors were obtained from StemExpress (Folsom, CA) and Astarte Biologics (Bothwell, WA). Cryopreserved CD4+ and CD8+ T cells, enriched by negative selection from PBMCs from normal healthy donors, were obtained from StemExpress. Cells were removed from liquid nitrogen storage and rapidly thawed in a 37 °C water bath with gentle hand shaking until only a small piece of ice remained. Cells were transferred to a 50 mL conical centrifuge tube. One mL of prewarmed media (RPMI-1640 (no phenol red) supplemented with 10% FBS, 10 mmol/L HEPES buffer, 1X GlutaMAX, 50 µg/mL gentamicin) was added dropwise to the cells. Fifteen mL prewarmed media was then slowly added. Cells were centrifuged at 200 × g for 10 min at room temperature. The supernatant was aspirated, and cells were resuspended in FACS buffer (calcium-and magnesium-free Hank's balanced salt solution (HBSS) supplemented with 2% FBS). Seventy five µL aliquots of cell suspension (5 million cells) were transferred to tubes containing 25 µl T cell antibody panel  Figure 15 shows the gating strategy for a representative donor. Sorted lymphocytes were centrifuged at 1000 × g for 5 min and pellets lysed in 350 µL Buffer RLT Plus (Qiagen, Germantown, MD) supplemented with 1/100th volume ß-mercaptoethanol. RNA was extracted using the RNeasy Plus Micro Kit (Qiagen, Germantown, MD) according to the manufacturer's instructions, and used for RNA-seq library preparation and sequencing.
In vitro T cell exhaustion. The in vitro generation of exhausted T cells was modified from Balkhi, et al. 88 , and performed by STEMCELL Technologies (Vancouver, BC, Canada). Naïve CD8+ T cells were isolated from fresh leukapheresis samples from three normal healthy donors using the EasySep™ Human Naïve CD8+ T Cell Isolation Kit II (STEMCELL Technologies) following the manufacturer's recommended protocol. The isolated cells were resuspended in media (RPMI supplemented with 10% FBS) to a final concentration of 1.5-2 × 10 6 cells/ mL. One hundred microliter aliquots of cell suspension (1.5-2 × 10 5 cells) were transferred to 96-well U-bottom plates. Cultures were rested at 37 °C, 5% CO2 for 30 min before the addition of tetrameric antibody complexes (ImmunoCult™ Human CD3/CD28/CD2 T Cell Activator, STEMCELL Technologies). A two-fold working stock of T Cell Activator was first prepared in media at a concentration of 50 µl/ml. One hundred microliters of working stock were then added to wells for a final concentration of 25 µl/ml. Cultures were incubated at 37 °C, 5% CO2 for a total of 14 days with re-stimulation occurring every two days, as follows. Every two days (  Processing of RNA-seq data. FASTQ files were preprocessed with trim_galore/cutadapt to remove adapter sequences as well as reads with PHRED quality scores less than 20 and reads that were shorter than 20 base pairs. The trimmed reads were aligned to the human genome GRCh38 with STAR using the 2-pass method. Read counts were generated using htseq-counts and annotation from Gencode v22.

RNA-seq library preparation and sequencing.
T cell subtype health expression model creation. Differential expression of the five sHEMs was initially performed using DeSeq2. Eight, three, six, five, and three libraries were used for the naive, activated, exhausted, EM, CM subtypes, respectively. EM and CM models were derived from T cell isolates sorted by flow cytometry. Activated and exhausted models were derived from day 4 and days 12 and 14 of the chronic stimulation in vitro experiment, respectively. The naïve model was derived from both T cell isolates sorted by flow cytometry and day 0 cells of the chronic stimulation in vitro experiment. For each subtype, genes were considered in descending order of log fold difference versus all other subtypes. Genes with a coefficient of variation larger than 0.25 and a maximum counts per million (CPM) less than 15 were ignored until 10 genes were chosen. The mean CPM of respective libraries for these selected genes was used to create the preliminary sHEM for each TCS consisting of 123 genes. The genes in these models were then filtered using Cancer Cell Line Encyclopedia (CCLE www.nature.com/scientificreports/ Specimens. 85 HNSCC samples were collected from pre-immunotherapy tumor tissue obtained from patients with RM-HNSCC that were treated with a PD-1 inhibitor (pembrolizumab or nivolumab). Sequential sections of formalin-fixed and paraffin embedded (FFPE) tissue blocks were utilized for analysis via T Cell Subtype Profiling and the on-label PD-L1 IHC assay. Patients were grouped according to tumor response to immunotherapy using RECIST criteria. The study design was approved by Washington University School of Medicine IRB. 21 NSCLC samples were collected at time of first diagnosis from patients before treatment with a PD-1 inhibitor (pembrolizumab or nivolumab) between April 2013 and January 2018 at the University Hospital Basel, the Cantonal Hospital Baselland, Switzerland, and the St. Clara Hospital Basel. The groups of patients analyzed is a subset of a cohort previously published 86 . PD-L1 IHC and TMB were performed and evaluated as previously described 86 . The study was approved by the local Ethical Review Board (Ethikkommission Nordwestschweiz, Project-ID 2018-01751) and performed in compliance with all relevant ethical regulations.
31 Melanoma samples were a subset of a cohort previously published 87 . Responders are defined as those with complete response and partial response and non-responders are defined as those with progressive disease according to RECIST criteria.
Informed consent was obtained from patients for all specimens according to the local IRBs. RNA was extracted from HNSCC FFPE samples using the RNAstorm™ Kit (Cell Data Sciences, Fremont, CA). RNA was extracted from NSCLC FFPE samples using the RecoverAll™ Total Nucleic Acid Isolation Kit (Thermo Fisher Scientific, Waltham, MA).
TCSP-based biomarker creation and analysis. TCSP-based biomarkers were optimized independently for each of the three indications via cross validation. Normalized and/or non-normalized TCSP readouts were used as (5 or 10) input features and were estimated using a single set of sHEMs optimized for all three indications. No other gene expression was included as a feature. Supplementary Table 2 summarizes the sHEMs used for each figure. After feature standardization, several feature projection (Principal Component Analysis, Independent Component Analysis, Kernel Principal Component Analysis) and machine learning algorithms (Adaboost, K-Nearest Neighbors, Random Forest, Support Vector Machine) were evaluated via cross validation. The machine learning (ML) model with the highest cross validated Area Under the Receiver Operating Characteristic curve (AUC) was chosen as the biomarker. Bootstrap sampling was used to cross validate and approximate ML model performance for future independent datasets. In bootstrap sampling, a set of samples are randomly sampled with replacement for training the ML model, with the remaining samples-called the out-of-bag set-used to evaluate the ML model's performance. This is done iteratively (hundreds of times) and a model's performance is evaluated by averaging the performance over all out-of-bag samples. Bootstrap sampling is the most rigorous statistical approach to predicting the performance of an ML model in future independent cohorts. Receiver Operating Characteristic (ROC) curves are used to show overall performance of TCSP-based biomarkers in predicting objective response. The curves shown for TCSP-based biomarkers are the mean outof-bag ROC of the optimal ML model. PD-L1 IHC and TMB ROC curves include all samples without any sampling procedure. Kaplan-Meier plots are used to show the ability for the same TCSP-based biomarkers to predict overall survival. To do so, the average out-of-bag prediction scores of all samples were thresholded at 0.5 to determine if a sample was biomarker positive or negative. All ML model optimization and evaluation was performed with Python (3.8.3) via the Scipy library (1.5.0).
Gene-set-based biomarker creation and analysis. The genes of six immune-related gene sets (Supplementary Table 3) were used for multi-analyte biomarker creation. The same machine learning algorithms and cross validations approaches used for creating TCSP-based biomarkers were used. In effect, TCSP features were exchanged for gene expression features, while controlling for the same algorithms, e.g. linear PCA+ SVM. A unique model was fit for each of the three tumor types. The mean OOB AUC was calculated for each tumor type, as well as a mean AUC across all tumor types per gene set. www.nature.com/scientificreports/