Ulcerative colitis immune cell landscapes and differentially expressed gene signatures determine novel regulators and predict clinical response to biologic therapy

The heterogeneous pathobiology underlying Ulcerative Colitis (UC) is not fully understood. Using publicly available transcriptomes from adult UC patients, we identified the immune cell landscape, molecular pathways, and differentially expressed genes (DEGs) across patient cohorts and their association with treatment outcomes. The global immune cell landscape of UC tissue included increased neutrophils, T CD4 memory activated cells, active dendritic cells (DC), and M0 macrophages, as well as reduced trends in T CD8, Tregs, B memory, resting DC, and M2 macrophages. Pathway analysis of DEGs across UC cohorts demonstrated activated bacterial, inflammatory, growth, and cellular signaling. We identified a specific transcriptional signature of one hundred DEGs (UC100) that distinctly separated UC inflamed from uninflamed transcriptomes. Several UC100 DEGs, with unidentified roles in UC, were validated in primary tissue. Additionally, non-responders to anti-TNFα and anti-α4β7 therapy displayed distinct profiles of immune cells and pathways pertaining to inflammation, growth, and metabolism. We identified twenty resistant DEGs in UC non-responders to both therapies of which four had significant predictive power to treatment outcome. We demonstrated the global immune landscape and pathways in UC tissue, highlighting a unique UC signature across cohorts and a UC resistant signature with predictive performance to biologic therapy outcome.


Discussion
We demonstrated in adult UC patients, global immune landscapes and molecular pathways in affected colonic tissue and identified those distinct to non-responders to biologic therapy. Further, we identified a transcriptional signature common across UC cohorts and a resistant signature specific for patients non-responsive to biologic therapy. We validated altered expression of several novel DEGs whose roles in UC pathobiology are unexplored. These findings provide insight into new genes with altered expression in UC tissue that could serve as potential biomarkers for precise diagnostics and targets for personalized therapeutic interventions for UC patients. We found strong similarities in immune landscapes among UC cohorts represented by altered neutrophils, DC, macrophages, mast cells, B cells, and subsets of T cells which are recognized to have key roles in IBD pathogenesis [4][5][6] . Further, in UC tissue we showed elevated M1 and reduced M2 populations, which have pro-and anti-inflammatory roles. Depending on their polarization, macrophages are able to foster each other's activity, increase activation of DC, and communicate with adaptive T and B cells in promoting inflammation 5,24-26 . T CD8 cell levels could vary in UC affected tissue due to variation in their subsets, and their plasticity may be attributed to dynamic interplay between intestinal tissue and circulating cells 27,28 . Further, decreased trends in T CD8 cells may result in weak antigen presentation and processing by intestinal epithelial cells 29 . Moreover, we found that abundances of the T CD4 memory activated subset differed between cohorts. Aberrant T CD4 responses lead to poor defense to pathogens and has been observed to vary among IBD patients, which in combination with disbalances in gut microbiota could be responsible for disease relapse 9,30 . Moreover, T CD4 and Tregs cells are found in uninflamed matched tissue from UC patients with active disease. When Tregs are dysregulated or deficient, the intestine is one of the first tissues that becomes inflamed due to constant immune stimulation by microbiota and food antigens 9,31 . In active IBD, Tregs can expand in the lamina propria, but their immuno-suppressive activity is diminished 32,33 . We speculate that differences in these subsets of T cells between patient cohorts could also be due to the composition of gut microbiota and regional diet. Similarly, variation in eosinophils, which play roles in protecting barrier integrity and immunity might be related to geographic and seasonal disparities among UC cohorts 34,35 . We observed increased trends in activated mast cells in UC tissue compared to control. Mast cell levels could vary in UC affected tissue depending on location of affected colon and inflammation status, and may account for variability that we and others have observed 36 . Recent findings revealed the presence of activated DCs and plasmacytoid DCs in colonic biopsies of UC and CD patients using the xCell platform 37 . Furthermore, single-cell sequencing data from one UC cohort 38 provides classification of multiple subsets of epithelial and stromal cells including inflammatory fibroblast, monocyte, microfold, and T cell networks 38 . Further development of new approaches differentiating active vs. non-active immune cells and interactive vs. non-interactive cells may provide for more precision-based identifiers of cell landscapes in IBD tissue. www.nature.com/scientificreports/ We identified similarly altered molecular pathways and DEGs in UC tissue across cohorts. These pathways are linked to bacterial and inflammatory signaling. Further, the UC 100 signature distinguished inflamed from uninflamed transcriptomes. UC 100 DEGs encode protein with established roles in UC inflammation, hypoxia, nitric oxide and matrix metallopeptidases 6,18-21 as well as those with novel, unexplored roles in IBD pathobiology. Many of these DEGs encoded protein functionally linked to metabolic energy functions such as alterations in lipid, glucose, and mitochondrial functions. Emerging findings show that aberrant energy metabolism may become another hallmark of IBD. Elevated lipids can drive intestinal inflammation, and in mouse models, blockade of their production ameliorates inflammation [39][40][41][42] . Newly identified DEGs from the UC 100 signature increased in UC tissue, LIPG and LPCAT1, are regulators of lipid metabolism. LPCAT1 encodes an enzyme responsible for the conversion of lysophosphatidylcholine to phosphatidylcholine and is involved in the regulation of lipid droplet number and size 43 . Limited studies showed that increased lipid droplets may drive intestinal inflammation 40,44 and LPCAT1 could play an important function in the synthesis of inflammatory lipids 45 . LIPG is a member of the triglyceride lipase family and may be involved in lipoprotein metabolism and endothelial biology 46 . Further, HCAR3 is involved in regulation of lipolysis during increased β-oxidation and may play integral roles in crosstalk between microbiome-derived metabolites and immune cells 47,48 . PCK1, decreased in UC, is a regulator of gluconeogenesis and its deficiency in macrophages was demonstrated to facilitate a proinflammatory phenotype 49 . Moreover, HMGCS2 and ACAT1, both decreased in UC, encode regulators of mitochondrial function and both play important roles in β-oxidation. In intestinal stem cells, HMGCS2 has a vital role in regulation of cellular differentiation and homeostasis, and its loss could impact barrier renewal and function 50,51 . ACAT1 plays an important role in ketone body metabolism and recently was implicated in Significantly upregulated IGFBP5, SELE, STC1, and VNN2 were marked in red font. (D) "Predictive" performance of the IGFBP5, SELE, STC1, and VNN2 panel, as determined by ROC curve analysis and calculating AUC in both anti-TNFα and anti-α4β7 non-responders before therapy (p < 0.05). Combined gene analysis of pretreatment data of UC patients (GSE73661) was employed and resulting sensitivity and specificity. (E) Multivariate regression analysis showed overexpression of the four genes was associated with higher risk of treatment failure. OR odds ratio, LL lower limit of 95% confidence interval, UL upper limit of 95% confidence interval.  www.nature.com/scientificreports/ inflammatory responses in macrophages as well as diet-induced obesity 52 . Another important aspect of altered energy dynamics in intestinal inflammation involves mitochondrial function. In IBD intestine, mitochondrial gene expression is aberrant leading to reduced respiratory activity and energy depletion, associated with bacterial signaling [53][54][55][56][57] . Smillie et al. suggested that metabolic alterations in intestinal cells and monocytes represented by a shift from oxidative phosphorylation to glycolysis may be driven by impaired production of microbiota short-chain fatty acids leading to upregulated pathways for dietary fatty acids 38,58 . Furthermore, a recent study described mitochondrial fission-fusion as critical in driving dysregulation of intestinal cells and macrophages, which could be targeted as a possible therapeutic approach 59 . Additionally, it is important to consider the effects of environmental factors in mediating mitochondrial reprograming such as the use of antibiotics and intake of a high-fat western style diet 60 . While the exact mechanisms and roles of metabolic reprograming in intestinal cells and immune cells are not fully understood, their emergence as a hallmark of intestinal inflammation highlights their critical importance in the underlying pathobiology of disease.
In UC patients non-responsive to anti-TNFα or anti-α4β7 therapy, we identified distinct immune cell landscapes, molecular pathways, and transcriptional signatures relative to responders. Increased abundances of neutrophils and activated subsets of T cells in the colon of UC non-responders to both treatments suggest that severity of disease might be a predictive factor for success to biologic therapy. Clinical studies demonstrated that non-responsiveness to the biologic therapy is, in part, related to disease severity, a patient's age at diagnosis, and duration of inflammation [61][62][63][64] . Furthermore, in UC patients non-responsive to anti-α4β7 treatment, we observed reduced M2 macrophage levels. Verstockt et al. transcriptomic analysis of UC tissue from patients prior to receiving biologic therapy revealed significant enrichment of immune cells in non-responders including M1 macrophages and Tregs, while responders had elevated naïve B cells 65 . Our CIBERSORT analysis also showed a trend in elevated M1 macrophages, while changes in Tregs were insignificant. Another report described enrichment of monocytes, M1 macrophages, activated DCs and plasmacytoid DC subsets in non-responders to biologic therapy 37 . While our approach did not provide significant increases in monocyte or M1 macrophage levels, they trended up in non-responder groups. Moreover, non-responders to biologic therapy have displayed higher inflammatory markers and cytokines in circulating monocytes 66 . Thus, based on several of these cellular characteristics in non-responders, further development of immune profile-based signatures could allow for precise diagnostics and optimal therapy selection in the future. Moreover, DEGs and molecular pathways are distinct for non-responders to anti-TNFα or anti-α4β7 therapy, highlighted by biologic functions pertaining to inflammation, growth, lipid metabolism, and mitochondrial dysfunction. We determined a specific resistance UC 20R signature representing "failure" to biologic therapy. Among them the top four differentially expressed genes, STC1, VNN2, SELE, and IGFBP5, possess significant "predictive" power to both anti-TNFα or anti-α4β7 therapy. SELE, VNN2, and STC1 play critical roles in neutrophil accumulation and transendothelial movement at sites of inflammation, suggesting a possible role for immune cell trafficking in non-responders 67 . Moreover, these novel and distinct features of disease could also occur, in part, because of changes in the microbiota caused by therapy 68 . Thus, we anticipate that transcriptional signatures found in UC patient tissue may guide selection of therapy and more personalized therapeutic approaches. With further advances in these technologies, we will expand understanding of systemic and specific changes in immune profiles, pathways, and transcripts to include other aspects of IBD heterogeneity including those from adult UC and CD as well as pediatric patients.
Here, with a comprehensive assessment of UC colonic tissue, we demonstrated both shared and distinct immune cell landscapes and molecular pathways. Our results could provide insight into disease pathogenesis and mechanistic reasons why certain patients do not respond to mainstay therapy. Utilization of bioinformatics approaches in combination with human genetics, epigenetics, and single-cell genomics will lead to better understanding of inflammatory disorders, risk of disease recurrence, and association with treatment outcomes leading to development of more precise, personalized diagnostics and therapeutic intervention for adult and pediatric IBD.

Materials and methods
Data sources of ulcerative colitis patient transcriptomes. Ulcerative colitis (UC) colonic tissue microarray and RNAseq gene expression datasets used in this study were obtained from the National Center for Biotechnology Information Gene Expression Omnibus (NCBI GEO) data repository using the following GEO accession numbers listed in Table 3. In total, 326 adult patient transcriptomes were analyzed from European and American cohorts 22,23,[69][70][71][72][73][74][75] ; pediatric UC patients were excluded from analysis. For healthy control (n = 42), we analyzed colonic transcriptomes obtained from individuals undergoing colonoscopy for either moderate gastrointestinal symptoms or colon cancer screening. Transcriptomes from UC patients included regions of inflamed (n = 154) and matched uninflamed control (n = 31). Additionally, we utilized transcriptomes from UC patients prior to and during anti-TNFα (infliximab) (n = 46) and anti-α4β7 integrin (vedolizumab) (n = 53) treatment (Table 3) 22,23 . Differential expression testing and pathway analysis. Differentially expressed genes (DEGs) from UC microarray datasets (GSE4183, GSE14580, GSE38713) were identified using the web-based NCBI GEO2R software application 76 . Only those genes expressed at a minimum threshold of > |2.0|-fold change as compared to healthy control (with an adjusted p < 0.05) were used for pathway analysis. Ingenuity Pathway Analysis (IPA) (www. qiagen. com/ ingen uity) was used to generate canonical pathway and disease and function analyses in the Tulane Cancer Center Next Generation Sequence Analysis Core (www. tulane. edu/ som/ cancer/ resea rch/ corefacil ities/ cancer-crusa ders). IPA analysis included appropriate use of background genes through pre-analysis filtering of 'species filtering' (i.e. human) as well 'tissue and cell lines' to include those relevant to the intestine (i.e. intestinal cells, immune cells, stromal cells, and others).  79,80 . Any genes duplicated in analysis were filtered and only those genes meeting an adjusted p < 0.05 were used. Transcripts were selected based on the most significant statistical significance and fold-change > |2.0| differences. Generation of biologic resistance signatures were accomplished using two microarray datasets from UC patients treated with anti-TNFα (GSE12251, GSE73661) or anti-α4β7 (GSE73661). DEGs were identified in non-responders compared to responders at various time points using the limma R differential expression analysis package and mean calculation was performed for gene-level summarization. Principal Component Analysis (PCA) and generation of a transcriptional signature score. PCA of inflamed UC tissue and matched uninflamed control RNAseq transcriptomes (GSE107593) was accomplished using the FactoMineR R package and PCA function 84 . The first two coordinates of samples and their percent variation were plotted. For summarizing transcriptional expression of the multicohort signature into a single value, the following formulation was utilized: with x representing the expression value of the transcript and m representing the median of the transcripts similar to the approach previously used by Agrawal et al. 85 .
CIBERSORT (cell type identification by estimating relative subsets of known RNA transcripts). UC microarray gene expression datasets were formatted into mixture files with patient identifiers and corresponding gene expression levels; these files were subsequently uploaded for CIBERSORT analysis according to formatting requirements (http:// ciber sort. stanf ord. edu) 17 . Findings were further validated using updated CIBERSORTx analysis 16 . Analysis of mixture files was performed using the core LM22 signature consisting of 547 genes that precisely differentiate mature human hematopoietic cells to determine relative abundances of 22 immune cell subsets including: B-cells (naïve, memory, plasma cells), T-cells (CD8, naïve CD4, memory CD4, follicular helper, regulatory, γδ), monocytes, macrophages (M0, M1, M2), dendritic cells, mast Score = mean log 2 x + 1 m www.nature.com/scientificreports/ cells, eosinophils, and neutrophils. Duplicated genes were filtered based on those meeting an adjusted p < 0.05 before being input into analysis. For those genes with multiple probes meeting significance thresholds, the average expression value of the probe identifiers was calculated and used for analysis. Immune cell output was reported as relative fractions for all immune cell subsets and represented as stacked bar charts as a proportion of one hundred percent or as fold-change differences normalized to healthy control or therapy responders.
Primary UC tissue RNA sources. Colonic mucosal tissue biopsies for validation of transcript expression were obtained from actively inflamed tissue sections of patients with UC (Origene Technologies; ID: CR561752, CR562039, CR562979, CR561525, CR560265) and from normal colonic mucosal biopsies of patients who underwent colon cancer screening or tissue removal (ID: CR560498, CR560940, CR560136).
qPCR. Total RNA obtained from human colonic tissue was utilized for cDNA preparation required for qPCR as previously described 56,86 . To determine the relative levels of mRNA the comparative Cq method was employed using Actin as a housekeeping control.

Statistical analysis.
Only those values meeting a significant threshold (p < 0.05) as determined by the CIB-ERSORT and GEO2R algorithms were included in analysis. Statistical analysis was performed between groups by Student's (paired or unpaired) t-test, analysis of variance (ANOVA) test, and Student Newman-Keuls post-test using Graph Pad Instat 3 software (Graph Pad Software). The prognostic performance of differentially expressed genes for predicting outcomes to therapy was estimated by receiver operating characteristic curve analysis and the area under the curve (AUC). The prognostic performance of differentially expressed genes for predicting outcomes to therapy was estimated by receiver operating characteristic curve analysis and the area under the curve (AUC) with each gene plotted with a curve, and diagnostic accuracy measures (sensitivity and specificity) reflecting the value of the combined analysis.