Predictive biomarkers of immunotherapy response with pharmacological applications in solid tumors

Immune-checkpoint inhibitors show promising effects in the treatment of multiple tumor types. Biomarkers are biological indicators used to select patients for a systemic anticancer treatment, but there are only a few clinically useful biomarkers such as PD-L1 expression and tumor mutational burden, which can be used to predict immunotherapy response. In this study, we established a database consisting of both gene expression and clinical data to identify biomarkers of response to anti-PD-1, anti-PD-L1, and anti-CTLA-4 immunotherapies. A GEO screening was executed to identify datasets with simultaneously available clinical response and transcriptomic data regardless of cancer type. The screening was restricted to the studies involving administration of anti-PD-1 (nivolumab, pembrolizumab), anti-PD-L1 (atezolizumab, durvalumab) or anti-CTLA-4 (ipilimumab) agents. Receiver operating characteristic (ROC) analysis and Mann-Whitney test were executed across all genes to identify features related to therapy response. The database consisted of 1434 tumor tissue samples from 19 datasets with esophageal, gastric, head and neck, lung, and urothelial cancers, plus melanoma. The strongest druggable gene candidates linked to anti-PD-1 resistance were SPIN1 (AUC = 0.682, P = 9.1E-12), SRC (AUC = 0.667, P = 5.9E-10), SETD7 (AUC = 0.663, P = 1.0E-09), FGFR3 (AUC = 0.657, P = 3.7E-09), YAP1 (AUC = 0.655, P = 6.0E-09), TEAD3 (AUC = 0.649, P = 4.1E-08) and BCL2 (AUC = 0.634, P = 9.7E-08). In the anti-CTLA-4 treatment cohort, BLCAP (AUC = 0.735, P = 2.1E-06) was the most promising gene candidate. No therapeutically relevant target was found to be predictive in the anti-PD-L1 cohort. In the anti-PD-1 group, we were able to confirm the significant correlation with survival for the mismatch-repair genes MLH1 and MSH6. A web platform for further analysis and validation of new biomarker candidates was set up and available at https://www.rocplot.com/immune. In summary, a database and a web platform were established to investigate biomarkers of immunotherapy response in a large cohort of solid tumor samples. Our results could help to identify new patient cohorts eligible for immunotherapy.


INTRODUCTION
Immune-checkpoint inhibitors (ICIs) were introduced for the treatment of solid and hematological malignancies with outstanding results in the last decade [1]. There are three groups of ICIs. The first group consists of ICIs inhibiting cytotoxic T lymphocyte-associated protein 4 (CTLA-4) on T cells [2,3], the second one is related to programmed cell death 1 (PD-1) receptor on lymphocytes [4], and the third group is linked to programmed cell death ligand 1 (PD-L1) on tumor cells. Physiologically, PD-1 is expressed on several immune cells (e.g., lymphocytes, natural killer cells), and PD-L1 is present on almost all somatic cells (e.g., hematopoietic cells or tumor cells). In tumors, blockade of the PD-1/PD-L1 axis results in the survival of the malignant cells [5].
Biomarkers are biological indicators that can be used to select patients for a systemic anticancer treatment like immunotherapy. A major limitation of the widespread use of immunotherapies is the fact that there are only a few clinically useful biomarkers capable to predict therapy response. A study [29] found that tumor mutational burden (TMB), and PD-L1 expression can predict response to pembrolizumab in a huge variety of cancers (melanoma, bladder cancer, breast cancer, CRC, HNSCC, and SCLC). The correlation between PD-L1 expression and MSI and response to pembrolizumab was also investigated in gastric cancer [30]. Cluster of Differentiation 8 positive (CD8 + ) T cell phenotype and TMB were associated with enhanced response to atezolizumab in mUC [31]. Findings from other studies highlighted the central role of the tumor microenvironment (TME) in nivolumab [32], pembrolizumab [33], and anti-CTLA-4 response [34]. The importance of both innate, and adaptive immune systems was emphasized in connection with anti-PD-1 response in NSCLC [35,36], melanoma [36,37], and HNSCC [36]. Meanwhile, in recent years, the tumors of several ICI-treated patient cohorts were investigated with transcriptomic technologies. The simultaneous analysis of the entire transcriptome makes it possible to identify new genes capable to serve as biomarkers of response and to validate previously suggested biomarker candidates.
Here, our goal was to expose and integrate available transcriptomic datasets of ICI-treated tumors to establish a framework enabling an integrated analysis of genes related to treatment sensitivity or resistance. Uncovering robust genes with increased expression in treatment-resistant tumors could offer the opportunity to develop a targeted therapy to augment the effects of immune-checkpoint inhibitors.

ICI dataset screening
We screened the NCBI Gene Expression Omnibus (GEO) repository using the keywords "human [organism] AND (anti-PD-1 OR anti-PD-1 OR anti-PD-L1 OR anti-PD-L1 OR anti-CTLA-4 OR anti-CTLA-4)", and "human [organism] AND (pembrolizumab OR nivolumab OR atezolizumab OR durvalumab OR avelumab OR cemiplimab OR ipilimumab OR camrelizumab OR cintilimab OR tislelizumab OR toripalimab)" on 10th Jan 2022. We omitted datasets with no available gene expression or clinical data, or with single-cell RNAsequencing (scRNA-Seq), T or B cell receptor sequencing (TCR/ BCR-Seq), non-mRNA-sequencing (e.g., whole-exome sequencing, non-coding RNA profiling, methylation profiling, protein array), studies of cell lines, stem cells, sorted peripheral blood mononuclear cells, studies in mice, studies without cancer, and GEO SuperSeries files. We also conducted a literature research to find additional studies. Our scope of investigation only included studies with simultaneously available clinical (response) and bulk-tissue gene expression data.
Database setup Gene expression data from all eligible datasets were combined into a single table, quantile normalized and scaled to 1000. For the clinical annotation, we categorized patients as responders or nonresponders based on reported pathological response or survival time. Those patients were selected as responders who experienced progression-free survival (PFS) longer than 12 months or had a partial response (PR) or complete response (CR). Those who experienced less than 12 months of PFS or had progressive disease (PD) or stable disease (SD) were categorized as nonresponders. Survival time was not used if the patient had no event and the follow-up time was censored before 12 months. Tumor samples obtained before induction of the therapy were termed "pre-treatment" samples, and tumors collected during or after the therapy were termed "on-treatment" samples.
Linking gene expression and therapy response Three separate analyses were performed across all genes to identify pre-treatment gene expression changes related to response against anti-PD-1, anti-PD-L1, and anti-CTLA-4 treatment. On-treatment samples were left out of the analysis because of low sample sizes.
We used Gene Ontology (GO) Enrichment analysis [38] to uncover biological processes connected to the gene lists related to response against anti-PD-1, anti-PD-L1, and anti-CTLA-4 treatment.
Druggability of candidate genes was determined by a literature search in PubMed and GeneCards (https://www.genecards.org/) to include those where (1) in silico prediction, (2) in vitro assay, (3) clinical study or (4) FDA-approval of the given drug were available.

Validation of the results
We extended our previously established ROC plotter platform to enable the investigation of new biomarkers and the validation of current results in all patients treated with immunotherapy. The platform is running on Ubuntu 20.04.4 LTS server driven by Apache 2.4.41. The front-end site was developed in PHP using the YII2 framework. The application data are stored in the PostgreSQL database and the computations are performed via an R script. The portal can be reached at https://www.rocplot.com/immune.

Statistical analysis
Statistical analysis was conducted in the R environment (https:// www.r-project.org/) using Bioconductor libraries (https:// www.bioconductor.org/). To find differentially expressed genes associated with improved or worse outcomes, Mann-Whitney unpaired U-test and receiver operating characteristic (ROC) analysis were used. To avoid false discovery due to multiple testing, Bonferroni-adjustment (P = 0.05) was applied with the service https://www.multipletesting.com/ [39].

Screening results
We have identified 225 series files in NCBI GEO fulfilling the initial search criteria. Through literature research, we also found the Cancer Research Institute iAtlas (CRI iAtlas) (https://www.criiatlas.org/), another portal with ICI-treated samples [40], in which another six datasets were found. Finally, five additional cohorts were found by looking up the referenced literature [30,37,[41][42][43].
In summary, from NCBI GEO [44,45], CRI iAtlas, Chen et al. [43], Litchfield et al. [41], Liu et al. [37], Kim et al. [30], and Miao et al. [42], altogether 246 datasets with 3823 samples were found. Then, we removed duplicate records. For example, Litchfield et al. described eleven studies out of which four were also included in CRI iAtlas, and two in GEO (GSE78220, GSE91061). A detailed description of the complete screening process is provided in Fig. 1.
Predictive biomarkers of immunotherapy response in solid tumors SA Kovács et al.

Database assembly
Datasets were individually investigated to include only those in which at least one of the following clinical variables was reported: progression-free survival or interval (PFS/PFI), relapse-free survival (RFS), overall survival (OS), recurrence, and response determined by the Response Evaluation Criteria in Solid Tumors (RECIST) (including complete response (CR), partial response (PR), stable disease (SD), or progressive disease (PD)). Twenty datasets with 2222 samples met these eligibility criteria and were selected for further analysis (Fig. 1). Sixty-eight samples were excluded due to different restrictions, such as (1) a very low number of samples treated with a specific agent (e.g., avelumab or experimental drugs), (2) duplicated samples, (3) missing clinical variable (e.g., event for survival), (4) no expression data, (5) ambiguous treatment time, (6) samples taken from metastatic sites or (7) low incidence of the given tumor type. By using these filtering criteria, the cohort was reduced to 2,154 samples out of which 720 samples were blood samplesthese were removed from further analysis.
Druggable genes with higher expression related to resistance to anti-PD1 administration First, the pre-treatment samples in group #1 were investigated by computing ROC AUC and P-values for 29,755 genes. Following Bonferroni correction, values reaching more than P = 1.6E-06 were excluded from further analysis, which led us to 912 significant genes.  (Fig. 2). The complete list of all significant genes is provided in Supplementary Table S1. Notably, non-protein coding genes (such as pseudogenes, long intergenic non-protein coding RNAs, antisense RNAs, regulatory RNAs, small nucleolar RNAs, open reading frame, etc.) were excluded from our screening. By using all significant genes (n = 912) for GO analysis, mechanisms such as retrograde transport, vesicle recycling within Golgi (GO:0000301), ncRNA catabolic process (GO:0034661), and T cell receptor signaling pathway (GO:0050852) were significantly overrepresented among these genes (Supplementary Table S2).
Druggable genes with higher expression related to resistance to anti-PD-L1 treatment ROC AUC and P-values from 26,819 genes were computed and following Bonferroni-correction, values over P = 1.8E−06 were excluded. This way, we identified 38 significant genes. The complete list of all significant genes can be found in Supplementary Table S3. The Gene Ontology analysis shows that mechanisms connected to the C-type lectin receptor signaling pathway (GO:0002223), cellular response to lectin (GO:1990858), and positive regulation of natural killer cell-mediated cytotoxicity (GO:0045954) were overrepresented in the list of significant genes (n = 38) (Supplementary Table S2). There were no upregulated, druggable genes capable to predict resistance against anti-PD-L1 therapy.
Druggable genes with higher expression related to anti-CTLA-4 treatment resistance In this third analysis, ROC AUC and P-values were calculated for 22,561 genes in the pre-treatment group. Of these, 80 genes reached significance after Bonferroni correction. Among them, BLCAP (FC = 1.7, AUC = 0.735, P = 2.1E−06) was found as a druggable gene overexpressed amongst non-responding patients (Fig. 2). The complete gene list can be found in Supplementary  Table S4. Non-protein coding genes were also excluded from this group. The GO analysis with multiple testing correction delivered no significant classification for these genes.
Mismatch-repair genes and response against anti-PD-1 treatment Finally, we aimed to determine to what extent one can predict sensitivity to the anti-PD-1 dostarlimab using the integrated database of published datasets. For this, we performed ROC analysis to evaluate the anti-PD-1 biomarkers previously published in a cohort of rectal cancer [56]. We used transcriptomic data of 419 samples from melanoma, bladder, renal cell, and gastric cancer in the anti-PD-1 pre-treatment cohort (n = 776). In this analysis, MLH1 and MSH6 achieved high predictive values (FC = 1.5, AUC = 0.682, P = 2.1E−11 and FC = 1.4, AUC = 0.629, P = 7.4E −06, respectively). Notably, 218 genes reached even higher ROC AUC than MLH1 (Fig. 3).

DISCUSSION
Here, we integrated available data from multiple datasets and used this combined database to uncover biomarkers related to response against ICIs in three independent clinical settings. An advantage of the presented analysis pipeline is the utilization of real-world patient data. While most of the individual studies have only a limited number of samples, our combined patient cohort with well over a thousand patients provides a higher statistical power.
Among the most significant genes related to resistance against anti-PD-1 treatment, we identified several potentially druggable targets. Fibroblast Growth Factor Receptor 3 (FGFR3) plays a pivotal role in tumorigenesis and the regulation of innate and adaptive immune systems [57]. Overexpression of FGFR3 is associated with an immunologically cold, T cell-depleted phenotype, which contributes to a low ICI response rate in bladder cancer [58]-just like a low PD-L1 expression in an FGFR3-mutant scenario [59]. From multikinase inhibitors (such as anlotinib, dovitinib, lenvatinib, and ponatinib) to selective FGFR inhibitors (e.g., erdafitinib, infigratinib, pemigatinib), various small molecule inhibitors are available for solid tumors and lymphohematopoietic cancers [60,61]. The combination of FGFR and PD-1 inhibition might also be beneficial [62]. The Src Proto-Oncogene Non-Receptor Tyrosine Kinase (SRC) is a well-known oncogene contributing to cell growth, cell proliferation, and survival. Tumor-induced cytokines, e.g., Macrophage Inflammatory Protein 1, and 2 (MIP-1 and MIP-2) activate Src kinases in immune cells which lead to the production of pro-inflammatory cytokines (e.g., interleukin-1ß and 6, Tumor Necrosis Factor α) that activate cancer cells in a positive feedback loop [63]. Multikinase inhibitors, such as the FDA-approved bosutinib, dasatinib, ponatinib, and vandetanib are currently being used for the treatment of chronic myeloid leukemia, acute lymphoblastic leukemia, and patients with thyroid cancer [64]. B-Cell Lymphoma 2 (BCL2) is a major regulator of the "apoptotic-orchestra" by inhibiting apoptosis and promoting cell survival [65]. Immune checkpoint molecules themselves promote an anti-apoptotic phenotype-leading to immune tolerance and low response rates [66]. Venetoclax is the only FDA-approved small-molecule inhibitor against BCL2 in acute myeloid leukemia (AML), and chronic lymphatic leukemia-yet other drugs might follow both in hematologic [67] and solid tumor malignancies [68]. Yes-Associated Protein 1 (YAP1) is a transcriptional coactivator, which upon binding to many transcription factors, such as TEAD3 (Transcriptional Enhanced Associate Domain 3), regulates the Hippo-signalling pathway, contributing to tumor growth, and resistance [69]. The Hippo-YAP system interferes with the innate immune system in multiple ways such as inhibiting the production of type I interferons (IFN-α, IFN-ß) and reactive oxygen species (ROS), attenuating NF-κB activation, or enhancing TNF-α and IL-1ß production [70]. These events contribute to the suppression of the innate immune system thus escaping immune recognitionwhich eventually leads to tumor survival. Verteporfin (VP) is widely used for the treatment of macular degeneration, however, current studies highlighted the antitumor effects of VP either with photoactivation or without it [71]. The capability of CA3, Super-TDU, statins, sitagliptin, SRC, FAK (Focal Adhesion Kinase), and tankyrase inhibitors to disrupt the YAP-TEAD complex were also discussed previously [72]. SET Domain Containing Lysine Methyltransferase 7 (SETD7) has a broad target-specificity as it is involved in many biological processes by interacting with p53, Estrogen Receptor-Alpha (ERα), or YAP1. For this reason, SETD7 can both activate and inhibit tumor-survival signals [73,74]. Upon methylation on K494 by SETD7, YAP1 accumulates in the cytoplasm and blocks the Hippo-pathway [75]. This leads to the nuclear accumulation of ßcatenin and the activation of the Wnt/ß-catenin pathway which is one of the most well-known oncogenic pathways [76]. Besides having a direct effect on cell proliferation and stemness, ß-catenin promotes a non-inflammatory milieu in tumors by inhibiting the activation and recruitment of CD8 + T cells and enhancing the infiltration and survival of regulatory T cells (Tregs). This leads to resistance to ICIs, emphasizing the potential of Wnt/ß-catenin as a predictive biomarker, or as a therapeutic target [77]. Moreover, SETD7 can methylate p65 and thereby inhibit the expression of NF-κB, and is a positive regulator of Transforming Growth Factor Beta (TGF-ß) productionall these contribute to tumorigenesis [78]. There are some promising results of inhibiting SETD7 with DC-S100 [79], DC-S285 [80], cyproheptadine [81], and ®-PFI-2 [73]. Spindlin 1 (SPIN1) is a histone methylation reader contributing to the epigenetic regulation of many oncogenic pathways so it is not surprising that SPIN1 was found to be overexpressed in many cancers [82]. Notably, SPIN1 acts as a transcriptional coactivator of ß-catenin and T cell Factor 4 (TCF-4) enhancing their contribution to Wnt/TCF-4 pathways, which leads to tumor progression [83]. Inhibitors of SPIN1 are being studied-e.g., A366, EML405, MS31, 4-aminoquinazoline and quinazolinethione derivatives, or VinSpi-nIn [82].
We have found only one gene related to anti-CTLA-4 resistance: BLCAP or Bladder Cancer Associated Protein. As an apoptosisinducing factor, BLCAP can initiate apoptosis in many tumors and is considered a tumor suppressor gene. Lost expression or degradation of BLCAP is observed in urothelial, renal, and cervical cancer, osteosarcoma, colorectal carcinoma, and human tongue carcinoma [84,85]. Nonetheless, poor survival of bladder cancer patients correlates with strong nuclear expression of BLCAP [86], which is impacted by the interaction of BLCAP and Signal Transducer and Activator of Transcription 3 (STAT3) in the JAK/ STAT-pathway [85]. Selective pharmacological inhibition of BLCAP can be observed with aristolochic acid in vitro [87].
In locally advanced rectal cancer, the loss of mismatch-repair genes was a highly significant biomarker of response to the anti-PD-1 inhibitor dostarlimab [56]. When re-evaluating the previously published genes, we were able to confirm a significant correlation with survival in our cohort for MLH1 and MSH6. Nevertheless, 218 genes reached even higher significance than MLH1 in these patients, pointing out that other biomarkers might be even more suitable to select patient cohorts for immunotherapy. The samples in our patient cohort stemmed from melanoma, bladder, renal cell, and gastric cancer suggesting that the loss of mismatch-repair genes could also enhance sensitivity to anti-PD-1 therapy in these tumor types.
There are some limitations of our study. Despite the importance of this topic, only a relatively small number of datasets have been found and included in the analysis. Since the currently used drugs are targeting proteins, an addition of protein-level data would have been useful. Unfortunately, we have not found even one dataset with additional protein abundance data. Finally, as ICIs are approved for advanced cancers, most patients have already received multiple treatment regimes, which might have resulted in significant background noise at the transcriptomic level preventing the identification of the most reliable biomarker candidates.
In summary, we have established a database consisting of gene expression and clinical response data by combining 1434 solid tumor tissue samples obtained before or after immune-checkpoint inhibitor treatment. The most significantly upregulated, pharmacologically important (druggable) genes were identified in connection with the resistance against anti-PD-1, and anti-CTLA-4 treatments. Our extended analysis platform can help to identify, validate, and rank future biomarker candidates.

DATA AVAILABILITY
The original data used in the publications are available from GEO or CRI iAtlas using their respective identifiers, or accessible via the indicated publications. 2020 funding scheme. The authors acknowledge the support of ELIXIR-Hungary (https://www.bioinformatics.hu/) and thank Viktoria Lakatos for the careful English editing of the manuscript.

AUTHOR CONTRIBUTIONS
BG was responsible for the concept and design of this article, SAK oversaw the data collection and performed the statistical analysis, JTF established the analysis portal. SAK and BG prepared the initial draft of the manuscript and all authors edited and approved its final form.

FUNDING
Open access funding provided by Semmelweis University.
Competing interests: The authors declare no competing interests.