Tumor Infiltrating Lymphocytes and Macrophages Improve Survival in Microsatellite Unstable Colorectal Cancer

Due to the loss of DNA repair mechanisms in colorectal cancer (CRC) with microsatellite instability (MSI), somatic mutations accumulate within DNA; making them more prone to attack by tumor infiltrating lymphocytes (TIL) and macrophages. We hypothesize that MSI-High (MSI-H) patients have favorable survival due to increased tumor immunogenicity. The Cancer Genome Atlas (TCGA) was used to evaluate gene expression from 283 patients with CRC, comparing MSI-H and microsatellite stable (MSS) patients. CIBERSORT algorithm estimated the fraction of immune cell types. We found that low expression of DNA repair genes (MLH1, MLH3, PMS1, PMS2, ATR, PRKDC, ATM, BRCA2) associated with MSI-H. MSI-H was directly associated with Helper T-cells (p = 0.034) and M1 macrophages (p < 0.0001). MSI-H tumors associated with diminished intra-tumoral heterogeneity as well as higher expression of checkpoint molecules PD-1, PD-L1, CTLA4, LAG3 and TIM3 (p < 0.0001). Improved OS was seen in patients with low ATM, PMS2 and MLH3. In the TCGA CRC cohort, decreased expression of DNA repair genes associated with MSI-H. MSI-H patients had improved survival, likely due to higher TIL and M1 macrophage infiltration as well as lower intra-tumoral heterogeneity. MSI-H also associates with expression of immune checkpoint molecules with potential for development of therapeutic targets.


Determination of Tumor infiltrating Immune Cells.
In order to differentiate the numerous cell types that compose the immune response we utilized the CIBERSORT deconvolution algorithm which uses a set of reference gene expression values as a representation of each cell type and identifies cell type proportions in data sets of colorectal tumor gene expression data obtained from TCGA (further described by Ali, H.R. et al.) 28 . Twenty-two cell types were investigated in this research using CIBERSORT its online calculator (https://cibersort. stanford.edu/). Gene Set Enrichment Analysis with TCGA. Gene Set Enrichment Analysis (GSEA) was performed using software provided by the Broad Institute (http://software.broadinstitute.org/gsea/index.jsp) 29 . All reported tests were conducted at a nominal significance level of 0.05. Statistical analyses were performed using R software (http://www.r-project.org/) and Bioconductor (http://bioconductor.org/).

Statistical analysis.
Patients were dichotomized to into low and high groups based on different expression levels of genes in interest. To determine the threshold of the dichotomization, running Cox proportional hazard statistics were applied. Differences in the overall survival (OS) or disease free survival (DFS) between the two groups were assessed at multiple candidate cutoffs within the range of gene expression level, and the optimal cut off point was chosen based on the statistical significance of the Cox proportional hazards model. To compare the survival curves of individual groups, the Kaplan-Meier method with log-rank test and Cox proportional hazard regression were used when appropriate. The reported results included hazard ratios (HR) and 95% confidence intervals (CI). Association between variables including MSI status, gene expression, cell composition and other clinical characteristics were accessed using Mann-Whitney U test.
All reported tests were conducted at a nominal significance level of 0.05. Statistical analyses were performed using R software (http://www.r-project.org/) and Bioconductor (http://bioconductor.org/).  (Fig. 1a-h). Although they did achieve statistical significance, the difference in expression of MLH3, PMS1 and BRCA2 were not as dramatic as we expected from previous reports which may reflect the difference in methodology of using RNA sequencing data from TCGA.

MSI-H tumors are associated with higher tumor mutation load and Cytolytic Activity Score (CYT) but lower Mutant-Allele Tumor Heterogeneity (MATH).
Next, we examined the mutation load, which we expected to be high in MSI-H tumors. As expected, MSI-H tumors had significantly higher mutation load than MSS tumors in this CRC cohort (p < 0.0001, Fig. 2a). These patients (MSI-H) also had higher cytolytic activity (as denoted by CYT) than those with MSS tumors, as shown by the jitter plot (Fig. 2b). Thus, indicating high cell killing activity intra-tumorally, most likely due to immune cell infiltration. We also measured intra-tumoral genetic heterogeneity by determining the MATH score for MSI-H compared to MSS patients. MSI-H tumors demonstrated significantly lower MATH than MSS (p < 0.0001, Fig. 2c).

MSI-H tumors possess higher composition of tumor infiltrating immune cells. Given the high
cytolytic activity score in MSI-H tumors, it was of interest to examine which immune cells are infiltrated in MSI-H patients compared to MSS utilizing the CIBERSORT algorithm (Fig. 3). T-cell expression was significantly higher in the MSI-H group compared to MSS in Gamma-Delta T-cell (p = 0.0013) and Helper T-cell groups (p = 0.034). This trend was also identified (without achieving statistical significance) in the CD8+ T-cell (p = 0.13) and Activated Memory CD4+ T-cell (p = 0.26) groups. MSS was associated with higher expression of Naïve CD4+ T-cells (p = 0.024) and Resting memory CD4+ T-cells (p = 0.0072). There was no difference between the two groups when measuring T regulatory (T-reg) cells (p = 0.96). MSI-H patients also had a greater fraction of M1 type macrophages than MSS (p < 0.0001) as well as more resting NK cells (p = 0.0001). There, however, was no difference between MSI-H and MSS groups when measuring for activated NK cells (p = 0.36). Table 1). GSEA using TCGA dataset identified 15 available immune-response related gene sets, which were significantly upregulated in the MSI-H CRC tumors; suggesting that MSI-H positively associated with intra-tumoral immune response in MSI-H tumors (Fig. 4).

MSI-H associates with high expression of immune-response related genes and immune checkpoint molecules (ICM). Gene Sets Enrichment Analysis (GSEA) was conducted to validate the association between CYT and immune-response signatures in MSI-H and MSS tumors (Supplementary
MSI-H and MSS tumors were then compared for expression of immune checkpoint molecules (ICM) including PD-1, PD-L1, CTLA4, LAG3 and TIM3 (

Discussion
In this study, we used a novel completely bioinformatic approach using unbiased RNA-sequencing data from the TCGA CRC data set to perform in-depth analyses of the tumor immune microenvironment of MSI-H vs. MSS patients and to confirm findings of increased intra-tumoral immunogenicity in MSI-H patients associating with improvement in clinical outcomes. We, using this unique methodology, were able to reach concordant results with studies which utilized conventional methods of evaluating tumor immunology such as immunohistochemistry or flow cytometry but with diminished costs, lower time and labor expenditure and with higher reproducibility.  www.nature.com/scientificreports www.nature.com/scientificreports/ Secondary to the diminished DNA repair mechanisms in MSI-H CRC, somatic mutations accumulate within coding and non-coding regions in DNA 4 . As the reading frames of oncogenes or tumor suppressors are altered, tumors are generated 4 . This impaired DNA repair and genomic instability can lead to increased neoantigen load on the surface of tumor cells which make them more prone to attack by lymphocytes and other immune cells than  www.nature.com/scientificreports www.nature.com/scientificreports/ MSS tumors 30 . In our evaluation of TCGA CRC cohort, we found that MSI-H patients trended towards having improved OS and DFS compared with the MSS cohort, a finding which is consistent with previously published data [31][32][33] . This trend appeared to be most pronounced in examining OS in stage II and stage III patients and less so in stage IV (likely due to low patient numbers in this group). We also found that MSI-H patients had a higher mutation load as well as high CYT compared to MSS, likely secondary to prominent lymphocytic infiltrate resulting in elevation in intra-tumoral immune cytolytic activity similar to what has been observed previously in different settings 12,22,30,34 .
In gene expression analysis from TCGA we identified several DNA repair genes and determined that low expression of MMR deficient genes including MLH1, MLH3, PMS1 and PMS2 as well as other double stranded break DNA repair genes including ATR, PRKDC, ATM and BRCA2 was associated with MSI-H. We also found improved 5-year survival in patients with lower expression of several of these genes including ATM, PMS2, MLH3, PMS1, MLH1 and ATR in comparison to survival in patients with MSI-H tumors which did not achieve statistical significance, likely due to fewer numbers of patients and a too short follow up. The fact that expression of DNA repair genes reached statistical significance may indicate that they may be stronger prognostic biomarkers.
For MSI-H CRC as well as other immunogenic cancers, a high level of T lymphocyte infiltration into tumors has been noted to be a positive prognostic factor 11 . MSI-H tumors are infiltrated with intra-epithelial cytotoxic T-cells and activated CD4+ helper T-cells, making them increasingly prone to a local cytotoxic immune response 35 . We noted this same association in the patients of this study with MSI-H tumors being significantly associated with infiltration by helper T-cells as well as trending towards increased infiltration by cytotoxic (especially Gamma-Delta) and activated memory CD4+ T-cells. There was, in our study, no difference between MSI-H and MSS groups in the expression of T-reg lymphocytes. Other studies have noted that increased expression of T-reg cells compared to CD4+ and CD8+ lymphocytes can indicate a poorer outcome likely due to suppression of cytotoxic T-cells 35,36 .
We also found in this study that MSI-H patients had a higher ratio of intra-tumoral M1 macrophages than the MSS group. M1 macrophages have been demonstrated previously to be associated with the inflammatory response via release of pro-inflammatory cytokines as well as pathogen clearance and anti-tumor immunity 37 . M1 macrophages have also been shown in previous studies to have tumor suppressive effects via production of reactive oxygen species which we hypothesize also may have contributed to the trend in improved survival in MSI-H patients 38 .
MSI-H tumors were also found to have elevated tumor mutation burden but diminished intra-tumoral heterogeneity as defined by MATH than MSS. This may have contributed to improvement in survival in MSI-H www.nature.com/scientificreports www.nature.com/scientificreports/ patients 26 . There has been increasing interest in increasing genetic diversity within tumors resulting in clonal evolution as a response to anti-tumor immunosurveillance [39][40][41] . We may speculate that MSI-H patients have low tumor heterogeneity due to increased clonal selective pressures from robust immunologic responses within these tumors.
Immune checkpoints are an immune inhibitory mechanism by which cancer cells evade anti-tumor immunity 42,43 . Some immune checkpoint molecules have been identified as potential targets for immunotherapy. These include PD-1 (programmed cell death molecule), PD-L1 (PD1 ligand), CTLA-4 (cytotoxic T-lymphocyte associated protein 4), LAG-3 (lymphocyte activation gene) and TIM3, an inhibitory molecule selectively expressed on IFN-γ-producing helper and cytotoxic T-cell responses [44][45][46][47] . This study found that expression all of these molecules (PD-1, PD-L1, CTLA4, LAG3 and TIM3) was higher in in the MSI-H group than MSS which may be a result of the immune activation driven by effector T-cells in patients with MSI-H tumors.
PD-1/PD-L1 binding has been demonstrated to block the effector function and motility of most lymphocytes, thereby decreasing the production of IL-2 (interleukin-2) by helper T-cells and diminishing the clonal proliferation of cytotoxic T-cells in response to cancer cells 44 . This impaired immune function is termed "T-cell exhaustion" and allows cancer cells to escape immune surveillance 11,44 . Studies have also noted that high PD-1 and PD-L1 expression on tumor cells has been associated with a weakened host immune response and subsequent poor prognosis in a number of malignancies 42,45,48 . Up-regulation of PD-L1 has been reported in several malignancies including CRC, melanoma, lung cancer, renal cell carcinoma, ovarian cancer, breast cancer and osteosarcoma 48 . It has been associated with more frequent incidence of vascular invasion, tumor recurrence and lower numbers of cytotoxic T-cells 48 . We identified a similar trend within this study, finding that MSS patients with elevated ICM expression had significantly poorer survival than MSI-H patients with lower ICM expression.
Recently, treatment with immune checkpoint inhibitors such as anti-PD1 antibodies (e.g. Nivolumab and Pembrolizumab) and its ligand anti-PD-L1 (e.g. Atezolizumab) have been increasingly used as an effective treatment strategy to combat various advanced cancers 11,49 . Le et al. in their phase II trial examining anti-PD-1 blockade found a significantly improved objective response rate and survival in MSI-H CRC compared to MSS with associated TIL elevation 49 . This is likely a result of MSI-H CRCs association with increased neoantigen (immunogenic tumor mutated peptide) expression, concurrent with PD-1 inhibition; resulting in elevated TIL expression and tumor regression 35,45 . Targeting MSS tumors in advanced CRC may present a more difficult challenge www.nature.com/scientificreports www.nature.com/scientificreports/ with decreased responsiveness to immunotherapy and would thus be more likely to be treated with standard chemotherapy regimens or via methodologies being developed to increase intra-tumoral immunogenicity (such as PARP inhibitors or vaccines) in combination with immunotherapeutic agents [50][51][52] .
Some limitations of TCGA data included limited clinical information regarding patients' co-morbid conditions and therapeutic information. We were also only able to make associations between these analyses of RNA sequencing data and clinical outcomes from TCGA without being able to elucidate underlying molecular mechanisms or make direct correlations. Also, the majority of patients had locoregional disease which likely contributed to the improved survival of these patients. Additionally, the TCGA was created prior to the wide use of immune checkpoint inhibition, thus our data reflects patients who did not receive those treatments.

Conclusions
This study used a unique bioinformatic approach to analyze RNA sequencing data, obtained from The Cancer Genome Atlas to identify disparities within the tumor immune microenvironments of MSI-H and MSS colorectal cancer patients. Using this approach, we were able to find an association between low expression of non-mismatch DNA repair genes as well as known MMR genes with high microsatellite instability. We found that MSI-H was associated with high TIL and M1 macrophage infiltration into the tumor immune microenvironment as well as with higher cytolytic activity and diminished heterogeneity (with likely increased clonality) which may have contributed to the improvement in survival. We also found an association between MSI-H and 15 different immune response gene signatures as well as immune checkpoint molecules which can be used to further development of targeted therapies (not only in ICMs which already have immunotherapies such as PD-1, PD-L1 and CTLA4, but also in future therapies targeting LAG3 and TIM3 which are currently targeted by no immunotherapeutic agents). The primary novelty of our study is that it allowed us to utilize a completely bioinformatic approach to perform an in-depth analysis of the tumor immune microenvironment using RNA sequencing data with low associated costs, increased feasibility and increased reproducibility than conventional methods of studying tumor immunology.

Data Availability
There are no restrictions on the availability of materials or data for this project. Data was obtained from the publicly available The Cancer Genome Atlas.