Introduction

Inflammatory Bowel disease (IBD), comprising Crohn’s disease (CD) and ulcerative colitis (UC), is characterized by chronic, relapsing inflammation in the gastrointestinal tract1. CD can affect any portion of the gastrointestinal tract with inflammation that can span across all layers of the gut, while UC is localized to the colon and rectum and confined to the mucosa. IBD is believed to be driven from the complex interplay between environmental factors and genetic susceptibilities, resulting in dysregulated immune responses to environmental triggers and the breakdown of the epithelial barrier and intestinal homeostasis2,3,4. Genome-wide association studies have revealed more than 200 IBD-susceptiblility genes, which are involved in microbial sensing, antigen presentation, autophagy, T-cell signaling, and other immune-related pathways5,6,7,8.

A wide range of cell types orchestrate intestinal host defense to environmental exposures. Characterizing cellular organizations and their rewiring in intestinal development and response to inflammation is of great importance to understanding IBD pathogenesis and to reveal novel potential treatment options. Recent studies utilized single cell and spatial omics profiling to provide an unbiased census of cell lineages and to characterize their functional states in healthy control and IBD samples9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27. These studies successfully identified novel cells, dissected well-known cell types at high resolution, and revealed spatial, temporal, and functional heterogeneity of cellular compositions9,10,11,12,13,14. Comparing cellular differences between IBD and healthy controls, they found molecular and cellular alterations in disease, identified cellular modules associated with drug response, and built transcriptional links between the developing gut and childhood CD15,19,22,24. While their main findings centered around immune cell signatures and immune-stromal interactions, few studies have shed light on epithelial cells regulating immune response and driving disease. Moreover, the ileum and colon exhibit distinct cellular compositions and have specialized preferences for nutrient absorption, microbe defenses, and endocrine function28. Whether Crohn’s ileitis and Crohn’s colitis/UC act through a common mechanism remains largely unknown.

In this study, we combined bulk and single-cell RNA profiling, multiplexed imaging of proteins and RNAs, and spatial transcriptomics to study molecular and cellular remodeling and reorganization in active/inactive CD compared to non-IBD controls in both the terminal ileum (TI) and ascending colon (AC). We not only discovered rewiring of epithelial, stromal, and immune cells, but also identified a specific epithelial cell type in CD, which we named “LND cells” given their high expression of LCN2, NOS2, and DUOX2. LND cells are present in both the TI and AC of CD patients and expand significantly with disease activity. LND cells specialize in regulating defense responses by recruiting and activating immune cells as signaling senders. Multiplexed imaging, transwell monocyte migration, and spatial transcriptomics further supported cross-talk between LND and immune cells. LND cells are predicted to be a pivotal cell type in CD pathogenesis, evidenced by highly specific expression of IBD susceptibility genes. The presence of the LND cell type in both the TI and AC of CD patients suggests a common link to dysregulated host-environment interactions. A high-resolution view of LND cells detected two subpopulations with different stem-potential and their ratio was associated with anti-TNF treatment response.

Results

Cellular landscape of terminal ileum and ascending colon in non-IBD control and CD

We profiled 82 TI and 88 AC specimens from either endoscopic biopsies or surgical resection specimens across 83 unique individuals (65 CD patients and 18 non-IBD controls) using single-cell RNA-sequencing (scRNA-seq), representing one of the largest cohorts of CD patients profiled (Fig. 1A and Supplementary Data 1). Non-IBD endoscopic specimens were collected from individuals presenting for colonoscopy for colorectal cancer screening or colon polyp surveillance without evidence of intestinal inflammation, while non-IBD surgical specimens were taken from normal adjacent tissue from patients undergoing surgical resection of endoscopically unresectable polyps in the cecum or ascending colon. Patient characteristics were as follows: ethnic background (CD: 77% white, 15% African American, 5% Asian, and 3% Hispanic; Control: 83% white, 11% African American, and 6% Hispanic), sex (CD: 63% female, 37% male; Control: 61% female, 39% male), and age (CD: 18–75; Control: 45–70). 6% of CD patients were treatment naive, with the rest currently undergoing various treatments or previously treated for their CD symptoms (Supplementary Data 1). Disease severity of each specimen was classified as active CD (31 mild, 9 moderate, and 11 severe) and inactive CD (58 normal and 17 quiescent) based on histopathologic analysis (Fig. 1B). The non-IBD specimens comprised 20 TI and 24 AC (Fig. 1B). In 77% of cases, matching TI and AC samples were collected from the same individual (Supplementary Data 1).

Fig. 1: Single-cell landscape in Crohn’s disease and non-IBD controls.
figure 1

A Schematic for processing endoscopic and surgical samples from TI and AC for non-IBD controls, inactive and active CD patients. B Summary of the number of samples in each group. C UMAP of 155,093 cells from endoscopy samples colored by cell clusters. D Dotplot showing markers for each cell type. E UMAP of 155,093 cells colored by tissue origin, TI (brown) or AC (blue). F Proportion of each cell cluster in TI (brown) and AC samples (blue). G UMAP of 155,093 cells colored by disease status, controls (tan), inactive (green) or active CD (purple). H MDS plot of cell compositional differences across all endoscopy specimens. Figure 1a was created with BioRender.com released under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International license (https://creativecommons.org/licenses/by-nc-nd/4.0/deed.en).

After quality control, 155,093 cells from endoscopic specimens and 47,266 cells from surgical specimens were retained, and these datasets were analyzed separately (Supplementary Data 2). Louvain clustering on single cells from endoscopic specimens revealed major cell types within the epithelial compartment (enterocytes/colonocytes, transit amplifying (TA) cells, stem cells, goblet cells, goblet proliferating cells, BEST4/OTOP2 cells, tuft cells, enteroendocrine cells (EEC), and Paneth cells—only in the TI), and the non-epithelial compartment (T cells, B cells, plasma cells, myeloid cells, cytotoxic T/natural killer (CTL/NK) cells, mast cells, and stromal cells) (Fig. 1C). Cell types were manually curated by known marker genes with TI enterocytes expressing APOA1/APOB, AC colonocytes expressing AQP8, TA cells expressing MKI67, stem cells expressing LGR5, goblet cells expressing MUC2 and TFF3, goblet profilerating cells expressing both MUC2 and MKI67, BEST4/OTOP2 cells expressing BEST4 and OTOP2, Paneth cells expressing defensins such as DEFA5 and DEFA6, T cells expressing CD3D and CD3G, B cells expressing CD79A and CD79B, plasma cells expressing IGHA1 and IGHA2, myeloid cells expressing C1QA and C1QB, mast cells expressing KIT, CTL/NK cells expressing GZMA, and stromal cells expressing VWF (Fig. 1D). These results are consistent with cell types identified by previous scRNA-seq studies of the healthy human small intestine and colon10,13. In addition, rare microfold-like cells were observed in only a few patients in the AC, and a very minor portion of IgG+ plasma cells (1%) were found within IgA+ plasma cells (Supplementary Note 1).

The largest transcriptomic difference was observed in absorptive cells from TI or AC (Fig. 1E), consistent with previous studies10,13 and reflecting tissue-specific functions (Supplementary Note 2). Differences in goblet, EEC, and tuft cells between TI and AC revealed genes that were specific to both cell types and regions; for instance, BEST2 was specific to AC goblet cells, while RGS13 was exclusive to TI tuft cells (Supplementary Note 2). While the TI and AC had distinct epithelial transcriptional programs, expression profiles of immune and stromal cells between the two sites largely overlapped, with similar cell types identified (Fig. 1C, E). In addition, the TI and AC exhibited similar composition of epithelial, stromal, and immune cells (Fig. 1F).

The second largest transcriptomic differences came from disease activity (Fig. 1G). UMAP co-embedding revealed a shift of epithelial cell transcriptional state from non-IBD control, inactive, to active CD (Fig. 1G), indicating transcriptional changes driven by disease activity. Quantifying cellular compositional distances between individuals by scUniFrac v0.9.629 revealed that disease status was one of the main factors driving compositional shifts (Fig. 1H). Surprisingly, UMAP co-embedding revealed intermixing of cells from different individuals with the same disease status, suggesting only subtle if any patient-specific variability (Fig. 1G).

Analysis of 47,266 cells from surgical specimens generated similar results, with similar epithelial and non-epithelial cell types identified by marker genes and gene signatures (Supplementary Fig. 8A, E, G). The same differences between transcriptional programs of the TI and AC were observed in data generated from biopsies (Fig. 1E) and surgical resections (Supplementary Fig. 8B). Furthermore, shifts in cell populations as a function of disease activity were also observed in the two specimen types (Supplementary Fig. 8C, F). However, cell distributions between biopsies and surgical resections were different, with endoscopic biopsies dominated by epithelial cells and surgical specimens enriched for immune and stromal cells (Supplementary Fig. 8D). This result reflects the superficial mucosal sampling of biopsies versus deeper submucosal sampling of surgical resections. For the rest of the analysis, we mainly focused on endoscopic specimens, which had a larger sample size with more evenly distributed disease status compared with surgical specimens.

Distinct immune and stromal cellular organizations in active CD

Because CD is inherently characterized as an inflammatory disease, we set out to delineate changes within the immune and stromal compartments as a function of disease activity. High resolution clustering revealed 12 populations of immune and stromal cells (Fig. 2A), encompassing pre-characterized T cells, CTL/NK cells, B cells, and plasma cells. Proliferative T and B cells were also marked by a proliferative signature including TOP2A and MKI67. Within myeloid cells, mast cells were annotated by high expression of KIT and FCER1A, and neutrophils were marked by high expression of S100A8, S100A9, and CXCR2 (Fig. 2B). We also recovered two types of macrophages. Resident macrophages expressed tissue residency markers MRC1 and complement genes C1QA and C1QB, while recruited macrophages highly expressed inflammatory molecules, such as NFKBIA, NFKBI, CXCL16, and CXCL9 (Fig. 2B). We identified fibroblasts expressing PDGFRA and ECM genes, such as COL1A1, COL1A2, and COL6A1 and endothelial cells expressing VWF and PECAM1 (Fig. 2B).

Fig. 2: Immune and stromal cell rewiring in Crohn’s disease.
figure 2

A UMAP of immune and stromal cells. B Dotplot of markers in each cell type. C Proportional changes of each immune and stromal cell type from non-IBD controls to inactive and active CD patients in TI (two-sided Wilcoxon test with FDR adjustment). D Proportional changes of each immune and stromal cell type from non-IBD controls to inactive and active CD patients in AC (two-sided Wilcoxon test with FDR adjustment). Data are represented as the mean ± SD (C, D).

As expected, cellular composition within the immune and stromal compartments differed significantly between inactive and active CD (Supplementary Data 3). Almost all immune and stromal cell numbers increased significantly in active compared to inactive CD (Fig. 2C, D), with the exception of B cells, proliferating B cells, and CTL/NK cells in the TI (Fig. 2C), and mast cells, B cells, proliferating B cells, and T cells in the AC (Fig. 2D). Among them, neutrophils and recruited macrophages showed the most significant elevation in active CD, in both TI and AC. The proportion of neutrophils increased from 0% to 1.3% (FDR = 2.0e−6) in TI and from 0.06% to 0.7% in AC (FDR = 1.5e−3), which is not surprising as the presence of neutrophils is a hallmark of histologically active disease, while the proportion of recruited macrophages increased from 0% to 2.7% in TI (FDR = 1.1e−4) and from 0.19% to 1.5% in AC (FDR = 1.5e−3) (Fig. 2C, D). Beyond cellular compositional changes, transcriptional upregulation of pro-inflammatory genes was observed in active CD compared to inactive CD within each cell type (Supplementary Note 3).

Immune and stromal cell compositions were mostly unchanged between inactive CD and controls, with the exception of T proliferating cells in the AC, which was slightly increased in inactive CD (from 0% to 0.17%, FDR = 0.024) (Fig. 2D). However, transcriptional upregulation of pro-inflammatory genes was still observed in a cell type-specific manner (Supplementary Note 3).

Analysis of surgical samples revealed similar cell types (Supplementary Fig. 9A). Furthermore, we identified three subtypes of fibroblasts, PDGFRA+ fibroblasts, ABCA8+ fibroblasts, and PDPN+ fibroblasts in both the TI and AC (Supplementary Fig. 9A). The three subtypes had distinct expression profiles. ABCA8+ fibroblasts had high expression of ABCA8, CFD and SFRP2, PDGFRA+ fibroblasts had enriched expression of PDGFRA, CXCL14, and ADGRL3, and PDPN+ fibroblasts were marked by high expression of PDPN, MMP1, and SOD2 (Supplementary Fig. 9B). The PDGFRA+ fibroblasts in the surgical samples were similar to fibroblasts identified in TI and AC endoscopic biopsies (Supplementary Fig. 9C). The PDPN+ fibroblasts, also called activated or inflammatory fibroblasts, reside in the submucosa of the inflamed intestine outside of the lamina propria15. They have been proposed as a central hub in IBD with an essential role in hematopoietic-stromal interactions15,19,30. An increasing trend was observed for most stromal cell types in the TI (not implemented in the AC due to only two active CD surgical samples), especially PDPN+ fibroblasts (Supplementary Fig. 9D). Increases in recruited macrophages and neutrophils were observed in surgical specimens similar to the endoscopic biopsies (Supplementary Fig. 9D). However, the increases were not statistically significant due to limited sample sizes in the TI surgical samples.

An LND subpopulation specialized in regulating mucosal immunity

Deep annotation of epithelial cells revealed a continuum of cells consisting of stem, TA, early, intermediate, and mature enterocytes/colonocytes (Fig. 3A, C)10,13. Early, intermediate, and mature enterocytes/colonocytes were distinguished based on markers previously defined for each cell type13 and predicted crypt-axis scores14 (Supplementary Fig. 10; details in “Methods”). Most interestingly, we identified a distinct absorptive cell type that emerges and then expands during active inflammation in both the TI and AC. This subpopulation was marked by high expression of LCN2, NOS2, and DUOX2, therefore, we named it LND (Fig. 3B, D). LND cells were present in 73.6% and 87.2% of CD samples in TI and AC, respectively (Supplementary Data 3). LND cells were rare in non-IBD control tissues, increased marginally in inactive CD (0.3% to 1.3%, FDR = 0.5 in TI; 0.06% to 0.6%, FDR = 0.1 in AC), and expanded significantly in active CD (1.3% to 18.9%, FDR = 1.7e−6 in TI; 0.6% to 17.8%, FDR = 1.0e−5 in AC) (Fig. 3E, F; Supplementary Data 3). The increase was observed both in the TI and AC, even though absorptive cells from both regions, as well as LND cells, have distinct transcriptomes. The proportion of LND cells was not associated with medication exposures (Supplementary Fig. 11). LND cells increased at the expense of early, intermediate, and mature enterocytes/colonocytes, as well as BEST4/OTOP2 cells as CD progresses from inactive to active disease (Fig. 3E, F; Supplementary Data 3). The emergence and expansion of the LND subpopulation in active CD was also observed in surgical TI samples (Supplementary Fig. 12). Beyond cellular composition alterations, cell-specific transcriptional changes with disease activity were also observed (Supplementary Note 3 and Supplementary Data 4).

Fig. 3: An LND epithelial cell type expands in active Crohn’s disease.
figure 3

A UMAP of 13 epithelial cell types in TI. B UMAP labeled with expression of LCN2, NOS2, and DUOX2 in TI. C UMAP of 12 epithelial cell types in AC. D UMAP labeled with expression of LCN2, NOS2, and DUOX2 in AC. E Proportional changes of each epithelial cell type from controls to inactive and active CD patients in TI (two-sided Wilcoxon test with FDR adjustment). F Proportional changes of each epithelial cell from controls to inactive and active CD patients in AC (two-sided Wilcoxon test with FDR adjustment). G Proportional changes of LND cells with disease status in six independent cohorts (two-sided Wilcoxon test with FDR adjustment). H Heatmap of high expression of immune-related genes in the LND in both TI (top) and AC (bottom). I Representative HCR-FISH Co-staining of LCN2 (green), NOS2 (pink), and DUOX2 (red) on non-IBD controls, CD with low LND and CD with high LND in TI and AC tissues (N = 2–5 per group). The scale bar represents 100 μm. Data are represented as the mean ± SD (EG).

We verified the emergence and expansion of LND cells in CD in six independent studies with clinical activity, including five microarray studies31,32,33,34,35 and one bulk RNA-seq dataset36. Consistently, LND cells were rare in healthy controls and emerged in uninflamed CD and expanded significantly in inflamed CD (Fig. 3G). Significant expansion of LND cells was observed from uninflamed to inflamed CD in both TI and AC in GSE17928531 (FDR = 2.3e−4 in TI and FDR = 2.2e−8 in AC) (Fig. 3G). In GSE7521432, LND cells also expanded from healthy controls to inactive CD in TI (FDR = 8.5e−6) and to active CD in AC (FDR = 2.2e−3) (Fig. 3G). A similar trend was detected in four other independent studies (GSE18658233 and GSE11236634 in TI, and GSE2088135 and GSE6620736 in AC) (Fig. 3G). The increase of LND cells at the expense of mature enterocytes/colonocytes was also observed in the six studies (Supplementary Figs. 13 and 14).

Genes defining the LND cell cluster, LCN2, NOS2, and DUOX2 are all involved in the host response to microbiota, generating products that can damage tissues. LCN2, lipocalin 2, acts as an antimicrobial protein, which attenuates bacterial growth by binding and sequestering iron-scavenging siderophores37. LCN2 is a serum and fecal biomarker for intestinal inflammation38, and it has been reported to be increased in serum from CD patients39. Importantly, LCN2 is implicated in a form of programmed cell death termed ferroptosis, which has been hypothesized to contribute to colitis-associated mucosal damage40. NOS2, or nitric oxide synthase 2, also known as inducible NOS or iNOS, is an enzyme that catalyzes the production of nitric oxide (NO), a broad-spectrum anti-bacterial agent41. NOS2 has been reported to be increased in colonic tissues42,43,44 and specifically in the colonic epithelium45,46 from both CD and UC patients. The contribution of NOS2 to IBD pathogenesis has been studied in several experimental models47,48,49. DUOX2, or dual oxidase 2, produces hydrogen peroxide, which is then metabolized by lactoperoxidase to oxidize pseudohalide into potent antimicrobial reactive oxygen species (ROS)50. Dysregulated ROS production has been implicated as a driving factor in IBD and variants of DUOX2 have been identified in very early-onset IBD51,52. Moreover, DUOX2 has been found to be upregulated in intestinal inflammation in a TLR-4-dependent manner53 and is involved in NOD2-mediated antibacterial response54. In addition to these marker genes, LND cells also express a high level of anti-microbial peptides (AMPs), including DMBT1, REG1A, REG1B, REG3G, PI3, S100A9, LYZ, SAA1, and SAA2, and upregulate transmembrane mucins (MUC13, MUC17, and MUC3A) that form the glycocalyx, which acts a physical barrier to luminal antigens (Fig. 3H). Downstream of immediate microbial defense, LND cells also overexpress genes that orchestrate immune responses. These include pattern recognition receptors TLR3 and its interacting partner TICAM1, inflammatory signaling and immunity modulator BIRC3, antigen-presenting machinery (HLA-B, HLA-A, HLA-DPA1, HLA-E, HLA-F, HLA-DQA1, HLA-DQB1, HLA-DQA2, and TAPBP), and cytokines (CCL20, CCL28, CXCL1, CXCL2, CXCL3, CXCL5, CXCL16, TNF, IFNG, IL13A, and IL17C) (Fig. 3H). Functional analysis of genes upregulated only in LND cells compared to other epithelial cells found that they were enriched in antigen processing and presentation, Th17 cell differentiation, Th1 and Th2 cell differentiation, HIF-1 signaling pathway, and TNF signaling pathway (Supplementary Fig. 15). These results suggest that the LND subpopulation that expands in active inflammation serves specialized functions of antimicrobial response and immunoregulation.

To confirm the presence and location of LND cells, we employed hybridization chain reaction-fluorescene in-situ hybridization (HCR-FISH) on tissues from TI and AC of non-IBD controls and CD patients with low and high proportions of LND cells (Fig. 3I and Supplementary Fig. 16). Generally, LCN2, NOS2, and DUOX2 expression was undetectable in non-IBD controls (Fig. 3I). Consistent with the scRNA-seq results in both TI and AC, LCN2, NOS2, and DUOX2 colocalized in a subset of epithelial cells in the high LND tissues but were barely detected in the low LND tissues (Fig. 3I and Supplementary Fig. 16). The epithelial protein expression of LCN2, NOS2, and DUOX2, as well as their colocalization in the high LND tissues, was confirmed by immunohistochemistry (Supplementary Fig. 17) and multiplex immunofluorescence (Supplementary Fig. 18). In summary, multiplex RNA and protein imaging validate the presence of LCN2, NOS2, and DUOX2 co-expressing epithelial cells in CD.

Developmental origins of LND cells

To infer the developmental origin of LND cells, we applied RNA velocity v0.6, an algorithm that predicts the future transcriptional states of each individual cell by the ratio of unspliced to spliced gene isoforms over the transcriptome55. As expected, we observed a cycling pattern for TA cells and a strong directional flow originating from stem cells, passing through early enterocyte, intermediate enterocyte, and ending in mature enterocytes in the TI (Fig. 4A). Interestingly, we noted two potential origins that point toward LND cell development, one was from early enterocytes/stem cells, and the other was from mature/intermediate enterocytes, which paralleled the two subclusters of LND cells from high resolution clustering (Fig. 4B). The subcluster that might originate from early enterocytes/stem cells was labeled “early LND”, while the other was labeled “late LND”. Partition-based graph abstraction (PAGA) analysis, which defines total connection strength between progenitor and differentiated cell populations56, also showed that early LND cells were associated strongly with early enterocytes, while late LND cells were linked to intermediate and mature enterocytes, as well as early LND cells (Fig. 4C). CytoTRACE v0.3.3 analysis to infer the developmental potential of cell populations57 predicted that stem and TA cells had the highest inferred stemness score, followed by early enterocytes, early LND, intermediate enterocytes, late LND, and finally mature enterocytes (Fig. 4D). Our results indicate that LND cells may differentiate directly from stem/progenitor cells (early LND), or they may arise later (late LND) from intermediate/mature enterocyte or from early LND cells themselves.

Fig. 4: Two LND subpopulations with different origins and developmental potential.
figure 4

A RNA velocity results mapped on the UMAP plot showing the predicted future transcriptional state of each cell. B UMAP of early (light red) and late (medium red) LND clusters in the TI. C PAGA results mapped on the UMAP plot showing connectivity between cell types. D Histogram plot comparing the developmental potential of enterocytes predicted by CytoTRACE (two-sided Wilcoxon test with FDR adjustment). E Comparison of the expression of CXCL5, TNFRSF1A, and TNFRSF1B between early and late LND cells (two-sided Wilcoxon test with FDR adjustment). F Comparison of the ratio of late to early LND cells between anti-TNF responders (tan) and non-responders (pink) after the first dose of medication (two-sided Wilcoxon test). Data are represented as the mean ± SD (DF).

Differential expression analysis between early and late LND cells found that early LND cells were enriched in neutrophil chemoattractants (CXCL3 and CXCL5), mucin (MUC1 and MUC4), and anti-microbial genes (DMBT1, PL2AG2A, REG4, and PIGR), while late LND cells were enriched in lipid-metabolic genes (such as APOC3, APOA4, APOB, and APOA1), cytokines (CCL20 and CCL25), MUC3A, REG3G, and TNF receptors (TNFRSF1A, TNFRSF10B, and TNFRSF1B) (Fig. 4E and Supplementary Fig. 19A). They shared similar expression levels of SAA1 and CCL28. Both early and late LND cells were increased as a function of disease activity, from normal non-IBD controls, inactive CD, to active CD (Supplementary Fig. 19B). The ratio of early to late LND cells was also associated with disease activity (Supplementary Fig. 19C), with early LND cells being enriched along the CD progression spectrum.

Since late LND cells expressed TNF receptors, we were curious whether the proportion of LND subclusters can predict anti-TNF response. We utilized the GSE16879 dataset58, which included 18 CD ileum patients assessed before and after their first anti-TNF treatment. In the Arijs et al. study, patients were classified as responders or non-responders based on endoscopic and histologic findings at 4–6 weeks after the initial treatment. We estimated the proportion of early and late LND cells by deconvolving bulk gene expression profiles through CIBERSORT v1.0559. Despite the limited sample sizes (n = 10 responders and n = 8 non-responders), we observed that patients with higher proportions of late LND cells were more likely to respond to anti-TNF treatment (p = 0.05) (Supplementary Fig. 20). The proportions of early LND cells and other epithelial cells were not correlated with anti-TNF response (Supplementary Fig. 20); however, the ratio of late vs. early LND cells showed a significant association with anti-TNF response (p = 0.012) (Fig. 4F).

LND cells actively interact with immune cells

To identify the potential immunomodulatory function of the LND cells, we inferred cell–cell communications between LND and any other cell types using CellChat v1.4.060. We found that LND cells are likely to actively interact with immune cells as both signaling senders and receivers with similar patterns in the TI and AC (Fig. 5A, B). Compared with other epithelial cells, LND cells showed much stronger cytokine-receptor interactions to recruit a variety of immune cells (Fig. 5A, B; Supplementary Note 4). The strong ligand–receptor interactions between LND and immune cells suggest a specialized role of LND in regulating mucosal immunity.

Fig. 5: LND cells interact with immune cells.
figure 5

Scatterplot of incoming and outgoing interaction strength of each cell type in the TI (A) and AC (B). C Representative multiplex images of CD8+ (red), CD4+ (green), and CD45+ (yellow) cells in low (N = 7) and high LND (N = 10) in the TI and AC. D DotPlot of marker genes in infiltrating lymocytes (ILs). E The proportion differences of ILs between low (orange) and high (purple) LND patients (one-sided Wilcoxon test). Data are represented as the mean ± SD. F Representative 40X images from Transwell monocyte migration assays from non-IBD control, inactive CD, active CD with low LND, and active CD with high LND. The migrated monocytes are stained in purple. G Violin Plot of the number of migrated monocytes in non-IBD control, inactive CD, active CD with low LND, and active CD with high LND. Statistical test was performed using a negative binomial mixed model (n = 21 measurements per group from 4 independent experiments; 6 quantified images from two wells in the first experiment and 5 images from one well in the next three experiments). Data are represented as the mean ± SEM.

We performed multiplexed protein imaging analysis on 55 tissues, of which 38 had single-cell RNAseq profiling (17 CD and 3 controls in TI, 15 CD and 3 controls in AC). We classified the multiplex imaging of CD patients into two categories based on the LND proportion reported in the single-cell RNAseq data, low and high LND. We observed those with a high LND proportion had a significantly higher infiltration of lymphocytes into the epithelial submucosa in both TI and AC compared to those with a low LND proportion (Fig. 5C, E) (p = 0.018 and p = 0.012, respectively). These infilitrating lymphocytes (ILs) were characterized by association of both epithelial (PANCK, NAKATPASE, and BCATENIN) and lymphocyte markers (CD3D, CD4, CD8, and CD45) (Fig. 5D). These results strengthen the association between CD activity, LND expansion, and immune cells infiltration. Since LND releases a variety of chemokines and cytokines (Fig. 3H) and actively interacts with immune cells (Fig. 5A, B), they likely play a role in immune cell recruitment and infiltration.

Next, we investigated the impact of LND cells on monocyte migration. Human colon organoids were generated from patients with normal histology (non-IBD control), inactive CD, active CD with low LND, and active CD with high LND (Supplementary Fig. 21). These organoid lines were then used to produce conditioned media for monocyte migration assays. Freshly isolated human monocytes from four normal blood donors were added above TranswellTM filters, with the organoid medium in the wells below. The number of migrated cells was quantified repeatedly in different fields, and differential analysis was performed using a negative binomial mixed model as described in the “Methods”. As expected, medium from active CD led to significantly increased monocyte migration compared to inactive CD and controls (p < =2e−4), while inactive CD showed a subtle increase in monocyte migration compared to controls (p = 0.02) (Fig. 5F, G). Importantly, active CD with high LND resulted in significantly higher monocyte migration than active CD with low LND (p = 0.001) (Fig. 5F, G), supporting the role of LND cells in enhancing monocyte migration.

In addition, we investigated spatial organization and crosstalk between LND and immune cells using spatial transcriptome from four CD samples selected for relatively high proportions of LND cells. The four samples consisted of two with active TI disease (GCA092 and GCA033) and two with active AC disease (GCA089 and GCA099) (Fig. 6A). As expected, LND marker genes, including LCN2, NOS2, DUOX2, and CCL20/CCL28, were coexpressed across spots in all four samples, indicating the existence of LND cells (Supplementary Fig. 22). In contrast, expression of LND marker genes were not correlated with immune cell signatures, including CD3D for T cells, CD8A and GZMB for CD8 + T/NK, MRC1 for resident macrophages, NFKBIA and NFKBIB for recruited macrophages, and S100A8 for neutrophils (Supplementary Fig. 22). Instead, high expression of LND marker genes in one spot was significantly correlated to high expression of immune cell signatures in its neighboring spots in all four samples using SpaGene v0.1.061 (Fig. 6B), suggesting heterotypic interaction between LND and immune cells (Fig. 6B). Specifically, NOS2, LCN2, and DUOX2 all showed a very significant colocalization with GZMB, S100A8, and NFKBIA in the GCA092_TI (FDR < 3e−16). DUOX2 colocalized with CD8A (FDR = 2e−8) and NOS2 with NFKBIA (FDR = 2e−8), CD8A (FDR = 2e−5), and GZMB (FDR = 8e−5) in the GCA033_TI. LCN2 colocalized with NFKBIB (FDR = 3e−9) in the GCA089_AC. NOS2 colocalized with NFKBIA (FDR = 2e−9), and LCN2 colocalized with S100A8 (FDR = 4e−8), NFKBIA (FDR = 1e−5), and NFKBIB (FDR = 2e−5) in the GCA099_AC. In comparsion, only marginally significant or insignificant colocalizations were found between the general epithelial genes (KRT8 and KRT18) and immune markers in these samples (Fig. 6B). To uncover cellular spatial organizations, we further deconvoluted cellular components in each spot based on our scRNA-seq data using RCTD62 and evaluated colocalization between epithelial cells and immune cells by SpaGene v0.1.061 (Fig. 6C). In the GCA092_TI, the most significant association was found between late LND and CTL/NK (FDR = 8e−35), followed by early LND-CTL/NK (FDR = 2e−27) and late LND-Neutrophils (FDR = 3e−16). In the GCA033_TI, significant association was observed between late LND-CTL/NK (FDR = 3e−8). In the GCA089_AC, early LND and neutrophils were signficiantly colocalized (FDR = 5e−24). In the GCA099_AC, LND and resident macrophages (FDR = 1e−3) were significantly colocalized (Fig. 6C). In summary, LND cells were much more significantly colocalized with immune cells in all four inflamed specimens as compared to other epithelial cells, further suggesting their specialized ability to interact with immune cells.

Fig. 6: Spatial organization between LND and immune cells.
figure 6

A H&E images for the four patient samples overlaid and colored by the expression of DUOX2. B Dotplot of colocalization of LND markes (LCN2, NOS2, and DUOX2) and the general epithelial genes (KRT8 and KRT18) with immune signatures. Only significant colocalization (FDR < 0.01) is included. Dot size denotes the z-value and color denotes the colocalization significance compared to random distribution. C Dotplot of colocalization between epiethial and immune cells. Only significant colocalization (FDR < 0.01) is included. Dot size denotes the z-value and color denotes the colocalization significance compared to random distribution.

LND is a CD/IBD-critical cell type

Genome-wide association studies (GWAS) on IBD have reported more than 200 genes involving 300 risk loci in multiple pathways5,6,7,8. Previous studies applying these SNPs in a cell-type-specific manner identified that these alterations in immune cells, especially T cells, are most strongly associated with IBD14. We combined GWAS-identified SNPs with single-cell RNA profiling to investigate the role of each cell type in CD. We utilized SNPsea v1.0.363 to infer cell type-disease association by evaluating expression specificity of CD/IBD-associated risk genes in our scRNAseq data, with the assumption that risk genes specifically expressed in a cell type are likely driving disease by affecting a function unique to this cell type. Consistent with previous results14, we found T cells to be the most CD/IBD-associated cell type (FDR = 0.001 in TI, FDR = 9e−5 in AC), followed by recruited macrophages (FDR = 0.001 in TI, FDR = 0.03 in AC) and CTL/NK cells (FDR = 0.02 in TI, FDR = 0.03 in AC) in both TI and AC (Fig. 7A, B) (Supplementary Data 5). T cell-disease association was driven by specific expression of FYN, PTPRC, CD28, CD5, CD6, CARD11, and other immune-related genes (Supplementary Fig. 23A). The macrophage-disease relationship was contributed to by specific expression of LITAF, HCK, SLC11A1, MMP9, FCGR2A, and TNFAIP3, and CTL/NK involvement was indicated by KIF2DL4, IKZF3, TNFRSF18, CTSW, and PTPN22 (Supplementary Fig. 23A).

Fig. 7: LND cells specifically express IBD/CD-risk genes.
figure 7

A Significance of cell-type specific expression of IBD/CD-risk genes in TI. B Significance of cell-type specific expression of IBD/CD-risk genes in AC. C Significance of cell-type specific expression of IBD/CD-risk genes in each CD TI tissue. D Significance of cell-type specific expression of IBD/CD-risk genes in each CD AC tissue.

Among epithelial and stromal cells, only LND cells were associated with CD/IBD, with marginal significance in both TI and AC (FDR = 0.1) (Fig. 7A, B). NOS2 was highly upregulated in LND compared to other cell types (Supplementary Fig. 23A). A NOS2 variant rs2297518 resulting in increased NO production has been associated with IBD (both CD and UC)64. CCL20 was also highly upregulated in LND cells (Supplementary Fig. 23A), and one of its gene variants, rs111781203, has been reported to decrease the risk of IBD6. Other CD/IBD-risk genes with high expression in LND cells were shown in Supplementary Fig. 23A, such as TNFRSF1A, STAT3, PLA2G2A, IRF1, TMBIM1, and PIGR. Although genes highly expressed in LND cells, such as DUOX2 and LCN2, have not been identified as CD/IBD-risk genes in large-scale GWAS studies, they have been reported to be associated with IBD risk or demonstrated to contribute to intestinal inflammation. Rare loss-of-function variants in DUOX2 have been associated with increased plasma levels of IL-17C in patients, and Duox2-deficient mice had altered microbiota composition and high Il-17c expression in the intestine65. Biallelic mutations in DUOX2 have been reported to be associated with very early-onset IBD51,52. Depletion of LCN2 in mice leads to dysbiosis with increased intestinal inflammatory activity and an induction of Th17 cell differentiation66.

We observed extensive transcriptional heterogeneity of key genes in LND cells across CD patients. For example, NOS2 was expressed highly in some patients, but its expression was low in others, although this gene was upregulated globally in the LND cluster (Supplementary Fig. 23B). To address patient heterogeneity, we further evaluated disease association of each cell type on a per patient basis. We found T cells were significantly associated with CD in 43 out of 46 patients in TI, and 33 out of 43 patients in AC (FDR <= 0.1). LND cells were significantly related to CD in 13 out of 31 patients in TI and 8 out of 25 patients in AC (FDR <= 0.1) (Fig. 7C, D). In one CD patient (GCA062) with severe TI involvement, LND was significantly associated with CD, superceding the involvement of immune cells outside of recruited macrophages (FDR = 0.001 for LND, FDR = 0.001 for T cells, FDR = 0.0008 for recruited macrophages). LND cells in this patient expressed high levels of NOS2 and CXCL5, suggesting that this population is likely disease-critical (Supplementary Fig. 23C). In contrast, no significant patient-specific association was observed for any other epithelial cell types, endothelial cells, or fibroblasts. These findings support the conclusion that LND cells might drive a significant portion of CD via dysregulated LND-immune cell communication.

Discussion

In this study, we present a comprehensive single-cell atlas of 170 specimens from 83 individuals, consisting of 202,359 cells from the terminal ileum and ascending colon of human gut in non-IBD controls and inactive and active CD patients. We confirmed prior findings about region-specific transcriptomics to maintain physiologic function of the intestine and colon. Despite the distinct epithelial transcriptome between the TI and AC, we identified similar cellular rewiring in epithelial, immune, and stromal cell proportions with CD activity. For example, T cells, Mast, and recruited macrophages expand from inactive to active CD. In addition to histology-based classification, we investigated changes in cellular compositions using the Clinical Disease Activity Index (CDAI), the most commonly used non-invasive standard for assessing disease activity (Supplementary Data 1). However, we did not observe significant correlations between the proportions of any cell types and CDAI scores (Supplementary Fig. 24). This outcome is likely attributed to inherent limitations in the CDAI, including subjective questions based on patient reported symptoms/overall well-being and variability when incorportating ideal versus actual body weight.

Most interestingly, we uncovered a special epithelial cell type, named LND, in both the TI and AC with high expression of LCN2, NOS2, and DUOX2. LND cells were rarely detected in non-IBD controls, but expanded significantly in active CD. While the LND proportion indicates CD activity, its direct application in clinical settings is not straightforward. Notably, we found a significant correlation between the average expression of LCN2, NOS2, and DUOX2 and LND proportions (r > 0.8). Moreover, there was substantial upregulation of the average expression in active CD compared to inactive CD, as well as inactive CD compared to non-IBD controls (p < 0.001), in both the TI and AC (Supplementary Fig. 25). These findings suggests that the average expression levels of these three genes could potentially serve as reliable surrogates in clinical settings to estimate the LND population and provide additional insights into disease activity beyond histology. Moreover, we observed a significant increase in the chemokines CXCL1 and CXCL9, and the cytokine IL6 in the serum from patients with high LND proportion compared to those with low LND proportion (Supplementary Fig. 26), suggesting LND status may predict differences in the level of systemic inflammation.

Compared to other enterocytes/colonocytes, LND cells had high expression of anti-microbial proteins (such as REG1A, REG1B, LYZ, PLA2G2A, SAA1/SAA2), inflammatory cytokines (such as CXCL2, CXCL3, CXCL5, CCL20, CCL28), as well as antigen-presentation and processing genes, STAT3 and STAT1, indicating a specialized immunoregulatory role. Cell–cell communication analysis supported that LND cells may actively interact with a variety of immune cells as signaling senders. Transwell migration assays provided evidence supporting the role of LND status in monocyte recruitment and spatial transcriptomics further showed the colocalization of LND and immune cells. The cross-talk between LND and immune cells highlights the role of LND in regulating mucosal immunity.

The intestinal epithelium is known to be the central coordinator of mucosal Immunity, which requires a synergy of distinct epithelial cell types to promote homeostasis. These cell types carry out unique and specialized functions, including enterocytes/colonocytes for nutrient and water absorption, goblet cells for secreting mucins, Paneth cells for releasing antimicrobial peptides, and enteroendocrine cells for producing hormones. LND cells, in comparison, highly expressed some host defense-related genes which are cell-type specific in homeostatic conditions. For example, REG1B, LYZ, and PLA2G2A, which are antimicrobial peptides specifically released from Paneth cells, are highly expressed in LND cells. Consistently, previous studies found that expression of genes that are cell-type specific in homeostatic conditions was broadened across multiple cell types during infection11. Therefore, LND cells are highly likely to be derived from enterocytes/colonocytes under chronic inflammatory stress, leading to specialized functions in immunoregulation. Studying the developmental origins of LND cells also supports that LND cells may originate from early enterocytes or intermediate/mature enterocytes.

LND cells not only had high expression of cell-type specific genes as mentioned above, but also showed high expression of IBD/CD GWAS-risk genes, such as NOS2, CCL20, TNFRSF1A, and STAT3. The specific expression of IBD/CD-risk genes suggest LND cells are a critical disease cell type. The disease-association of LND cells was quite heterogenous across patients. In TI, LND cells in ~30% of CD patients showed significant disease association and were ranked the second most important cell type. The heterogeneity of LND cells also reflects the complex and multifactorial pathogenesis of CD. In addition to IBD/CD-risk genes, LND cells were marked by high expression of additional genes previously demonstrated to modulate colitis, indicating their potential pathogenic role. Our studies identified that hematopoietic-LND cell interactions play an important role in regulating host response and driving CD, which extends previous findings emphasizing hematopoietic-stromal interactions as a central hub in IBD pathology15,19,30.

Taken together, our study identified a special LND cell population with unique molecular features enriched in immunoregulation, providing a better understanding of the mechanisms sustaining the pathogenic process in Crohn’s disease. Our results indicate that LND marker genes and their cellular proportion could have clinical significance as markers of disease activity, risk for disease progression, or likelihood of anti-TNF response. Our findings establish the possibility of meet evolving clinical needs with characterization and personalized treatment of CD at the molecular level, which would greatly benefit future clinical studies.

Methods

Human specimen collection and processing

The study protocol was approved by the Institutional Review Board at Vanderbilt University Medical Center. Written informed consent was obtained from non-IBD control and CD subjects to obtain serum samples, as well as, terminal ileum (TI) and ascending colon (AC) tissues at the time of scheduled endoscopic procedures. TI and AC tissues from non-IBD control and CD subjects undergoing surgical resection were also obtained from under a separate IRB protocol in coordination with the Comparative Human Tissue Network (CHTN). All samples were obtained as a part of the clinical trial “Combinatorial Single Cell Strategies for a Crohn’s Disease Gut Cell Atlas”, identifier NCT04113733 (clinicaltrials.gov).

Between December 2019 and January 2022, endoscopy subjects were prospectively recruited in the IBD clinic or GI endoscopy unit at Vanderbilt University Medical Center prior to colonoscopy for CD disease activity assessment or non-IBD indications including colorectal cancer screening or polyp surveillance. Surgical resection subjects were those undergoing resection for CD-related complications or other non-inflammatory indications, including endoscopically unresectable polyps. Patient participation in the current study ended after serum and/or tissue samples were obtained. Exclusion criteria for the study were: pregnancy, known coagulopathy or bleeding disorders, known renal or hepatic impairment, history of organ transplantation, or inability to give informed consent. After appropriate exclusions, there were 65 CD subjects with varying disease activity and 18 non-IBD controls.

For all participants, demographics including age, gender, medical history, and medication use were determined from participant reporting and review of the electronic medical record. Serum samples were obtained at the time of colonoscopy and were snap frozen with dry ice and then stored at −80 °C39,67. Tissue biopsies for research purposes in the TI and AC were obtained as follows: fresh tissue biopsies were placed in chelation buffer (4 mM EDTA, 0.5 mM DTT in DPBS) for further processing and scRNAseq analysis, and an adjacent set of tissue biopsies were formalin-fixed and paraffin-embedded (FFPE) for research blocks. 5 µm sections were used from each FFPE block and stained with hematoxylin and eosin (H&E) and examined in a blinded manner by a gastrointestinal pathologist (MKW) and graded accordingly as: inactive (normal_CD, quiescent) or active (mild, moderate, or severe activity). All associated study data were collected and managed using Research Electronic Data Capture (REDCap) electronic data capture tools hosted at Vanderbilt68,69, including Clinical Data Interoperability Services, such as Clinical Data Pull70 and e-consent71.

Single-cell encapsulation and library generation

Single-cell RNA-sequencing was performed on human biopsies and surgical specimens72,73. For surgical specimens that were large (>2 mm2), a representative portion (~2 mm2) of the tissue was used, while smaller tissues and biopsies were processed directly. Briefly, tissues were incubated in a chelating buffer (0.5 M EDTA, 0.1 M DTT in DPBS) for 1.25 h, and then transferred to cold active protease (5 mg/ml Protease from Bacillus licheniformis, 2.5 mg/mL dNase in PBS) and incubated with rotation for 25 min at 4 ˚C. Tissues were then pipetted 10–20 times to yield single cells. Cell suspensions were filtered, washed, and inspected for count and quality before loading onto a microfluidics platform for inDrops single-cell encapsulation using standard inDrops scRNA-seq gel-beads74,75. We targeted 2–3000 cells captured per sample, at a rate of encapsulation not exceeding 15 min total time. Single-cell libraries were prepared for sequencing using the Trudrop library structure to enable dual indexing76. Libraries (consisting of an estimated 2000–3000 cell transcriptomes) were sequenced at ~125 million reads each on the Novaseq6000.

HCR-FISH

HCR FISH was performed for three targets mRNAs using three DNA probe sets DUOX2, LCN2, and NOS2, using the HCR™ RNA-FISH Protocol for FFPE tissue sections77. Tissue slides were baked at 60 °C for 1 h, followed by tissue deparaffinization by immersing slides in Xylenes, 3X for 5 min. After deparaffinization, slides were incubated in 100% Ethanol, 2X for 3 min. Rehydration of tissue slides was done by series of graded ETOH washes at 95%, 70%, and 50% concentrations followed by nanopure water wash. After the rehydration steps, slides were immersed for 15 min in Tris-EDTA buffer (pH 9.0) at 95 °C. Tris-EDTA buffer temperature was slowly cooled down to 45 °C in 20 min, by adding nanopure water every 5 min. Slides were kept in nanopure water for 10 min at room temperature, followed by PBS1X wash. Proteinase K was introduced at 0.5 µL of 20 mg/1 mL PBS1X concentration, for 10 min at 37 °C, followed by PBS1X washes. 200 µL of Probe Hybridization Buffer was added on top of each tissue sample for pre-hybridization and slides were kept in humidified chamber, at 37 °C, for 10 min. Probe solution was prepared by adding 0.4 µL of 1 µM Stock/100 µL of probe hybridization buffer at 37 °C for DUOX2 and LCN2 probe sets and 0.8 µL of 1 µL of 1 µM stock/100 µL of probe hybridization buffer at 37 °C for NOS2 probe set. Pre-hybridization solution was removed from tissue slides and 100 µL of the Probe solution was added on top of each tissue sample. Sample slides were covered with parafilm and incubated overnight at 37 °C in the humidified chamber. Excess probes were washed by incubating slides at 37 °C in: (a) 75% of probe wash buffer/25% 5X SSCT for 15 min, (b) 50% of probe wash buffer/50% 5X SSCT for 15 min, (c) 25% of probe wash buffer/75% 5X SSCT for 15 min, (d) 100% 5X SSCT for 15 min. Slides were immersed in 5X SSCT for 5 min at room temperature. For pre-amplification, 200 µL of amplification buffer was added on top of each tissue sample for 30 min at room temperature. 2 µL of 3 µM stock hairpins h1 and h2 (per slide), for each probe set, were separately heated at 95 °C for 90 s and cooled to room temperature in the dark for 30 min. Hairpin solution was prepared by adding snap-cooled hairpins h1 and snap-cooled hairpins h2 to 100 µL of amplification buffer at room temperature. Pre-amplification buffer was removed and 100 µL of the hairpin solution was added on top of each tissue sample. Slides were incubated overnight ≥12 h at Room temperature. Excess hairpins were removed by incubating slides in 5X SSCT at room temperature for 1 × 5 min, 2 × 15 min and lastly 1 X for 5 min. Slides were dried by blotting edges on a kimwipe. 100 µL of Hoechst stain (1:100 dilution) was added on top of each tissue slide and slides were incubated at room temp for 5 min. Cover slipping was done by using Invitrogen Prolong™ Gold antifade reagent. Slides were imaged using the Aperio Versa slide scanner (Leica). Probe sets were designed by Molecular Instruments: LCN2 Probe set: probe set size 13 targeting NM_005564.5, DUOX2 Probe set: probe set size 20 targeting NM_014080.5, NOS2 Probe set: probe set size 20 targeting NM_000625.4.

Multiplex immunofluorescence and image analysis

Multiplex immunofluorescence (MxIF) imaging was performed on FFPE sections at 4 μm after standard histological processing and antigen retrieval. Slides were iteratively stained using a fluorescence-inactivation protocol78,79, using directly labeled or bench-conjugated antibodies incubated overnight at 4 °C. Slides were scanned using the Aperio Versa (Leica) at ×20 magnification, and then were photo-inactivated with an alkaline peroxide solution for repeated staining and imaging cycles until images for all analytes were acquired. A validated antibody panel was used, including DAPI, NAKATPASE, PANCK, CD8, CD4, and CD4572. The following antibodies were used for LND: LCN2 (Novus Biologicals, Cat. number NB100-1503) with secondary (Jackson ImmunoResearch, Cat. number 705-547-003), NOS2 (Novus Biologicals, Cat. number NBP2-22119AF750), and DUOX2 (Novus Biologicals, Cat. Number NB110-61576AF647). Images were computationally registered and corrected for illumination and autofluorescence against the interlaced blank imaging rounds. Cells were segmented with an algorithm modified from one published80, using a combination of machine learning, computed membrane completion, and image watershed using a combination of multiple membrane markers. Cells meeting a certain quality thresholds of size were kept. The mean, standard deviation, median, and maximum staining intensity for each protein was quantified with respect to the whole cell, cell membrane, cytoplasm, and nucleus. Location, area, and shape metrics were obtained. Cells were clustered based on the similarity of protein intensity profiles and each cluster was annotated by positive expression of known marker genes.

Immunohistochemistry

IHC was performed for three targets LCN2, NOS2, DUOX2. Tissue slides were baked at 60 °C for 1 h, followed by tissue deparaffinization by immersing slides in Xylene. After deparaffinization, slides were incubated in 100% ethanol. Rehydration was done by a series of graded ethanol washes at 95%, 90%, and 80% concentrations followed by a distilled water wash. Antigen retrieval was performed by rinsing slides in PBS, then slides were immersed in retrieval solution (DAKO target retrieval, pH 9) under pressure using a Biocare Decloaking Chamber. Slides were then rinsed with PBS. 3% hydrogen peroxide was applied to the slides for 15 min followed by Universal Protein Block (DAKO, Cat. #X0909) for 40 min at room temperature. The protein block was removed, and the slides rinsed with PBS for 5 min. Diluted primary antibody was applied to the slides and allowed to incubate overnight at 4 °C. Antibodies used were: LCN2 (Sigma, Cat. #HPA002695, 1:300), NOS2 (Novus Biologicals, Cat. Number NBP2-99091, 1:600), and DUOX2 (Novus Biologicals, Cat. Number NB110-61576, 1:300). The slides were rinsed with PBS. HRP-labeled polymer anti-Rabbit (DAKO, Cat. #K4003) was allowed to incubate at room temperature for 45 min, then the slides were rinsed with PBS. DAB+ substrate (DAKO, Cat. #K3468) was applied for 3–5 min at room temperature, then the slides were rinsed with PBS. Counterstain with CAT hematoxylin (Biocare Medical, Cat. #CATHE-MM) was applied for 1 min. The slides were rinsed with deionized water 6–10 times for 1 min each. Tacha’s bluing solution (Biocare Medical, Cat. #HTBLU-MX) was applied for 45 s at room temperature. The slides were rinsed with deionized water 6–10 times for 1 min each. Next slides were dehydrated with xylene and ethanol. Cover slipping was done by using cytoseal (Epredia, Cat. #8312-4). Slides were imaged using the Aperio AT2 Bright Field slide scanner (Leica).

Spatial transcriptomics

Spatial transcriptomics was performed using the Human FFPE Visium platform version 181. FFPE sections (5 μm) of biospies were cut and directly placed onto 6.5 mm × 6.5 mm capture areas of Visium FFPE spatial gene expression slides (10X Genomics). Visium slides were stained with hematoxylin and eosin, temporarily coverslipped, and imaged in brightfield at ×20 magnification using Aperio Versa (Leica) prior to tissue permeabilization, probing, and library prep according to the Human Visium FFPE protocol (10X Genomics). Sample libraries were sequenced on the NovaSeq6000 sequencer (Illumina) adjusted to ~50,000 reads per spot. Resulting sequencing data were aligned using 10X Genomics Space Ranger version 1.3.0 (10X Genomics).

Single-cell RNAseq alignment and quality control

Single-cell RNAseq reads were filtered, demultiplexed, and quantified by dropEst v0.8.682 to generate cell-by-gene count matrices. Specifically, reads with expected structure were kept, and cell barcodes and UMI were extracted by dropTag. Demultiplexed reads were aligned to the human reference transcriptome GRCh38 using STAR v2.7.9a83. Uniquely mapped reads were quantified into UMI-filtered counts by dropEst v0.8.6. Cells with >40% mitochondria reads, or <500 UMI counts, or <200 or >6000 genes expressed were considered as low quality and excluded. After this rough quality control, each sample was manually checked to remove those clusters of empty droplets (low number of UMI and genes, and no distinct markers) and clusters of doublets (high number of UMI and genes, and markers from two different cell types). Samples with cells less than 100 were excluded from the downstream analysis. Outliers and batch effects were detected using scRNABatchQC v0.10.384.

Single-cell RNAseq data analysis

Single-cell RNAseq count matrices were normalized to 10,000 and the top 2000 highly variable genes were selected by fitting the variance-mean relationship in the Seurat v4.1.0 package85,86. The normalized data were scaled to z-scores and principal component analysis was performed to reduce dimension. The top 30 principle compoents were used to generate the UMAP embedding for visualization and to to build the k-nearest neighbor graph (k = 20). Louvain clustering at a resolution of 0.8 was applied on the graph to partition cells into non-overlapping groups by the Seurat v4.1.0. Cell clusters were automatically annotated by a marker-based approach scMRMA v1.087 and were further manually curated using cluster-specific genes from the differential expression analysis. Specifically, the categorization of early, intermediate, and mature enterocytes/colonocytes was established through a multi-perspective approach. Firstly, we considered markers identified in previous studies, such as high expression of FREM1, PCCA, and DMBT1 in early enterocytes, RBP2 in intermediate enterocytes, and APOA1, APOC3, APOA4, and GUCA2B in mature enterocytes13. Similarly, early colonocytes were defined by high expression of B3GNT7, ABR, and ADH1C, intermediate colonocytes by ATP5G1 and PCNP, and mature colonocytes by AQP8, GUCA2A, CA4, and CEACAM113 (Supplementary Fig. 10). Moreover, differentiation among early, intermediate, and mature colonocytes was determined using a crypt-axis score as proposed by Parikh et al.14, where mature colonocytes, locate at the crypt-top with the highest scores, contrast with early clonotypes situate at the crypt-bottom with the lowest scores (Supplementary Fig. 10). Secondly, we employed cell differentiation state and trajectory analysis, with detailed methods outlined in the Developmental Trajectory analysis. The clustering and annotation results served as the input to scUnifrac v0.9.629 to quantify cell compositional distances across samples, which considered both cellular compositions and similarities. Multidimensional scaling was used to map each sample into a space based on pairwise distances from scUniFrac v0.9.6.

Cell type deconvolution for bulk transcriptomics data

CIBERSORT v1.0559 was applied to characterize the cell composition of bulk RNAseq data using single-cell transcriptional profiles of each cell type from TI and AC as the reference. The signature matrix was created from average expression of the top 100 marker genes in each cell type. Default parameters were used to implement CIBERSORT v1.05, except that the parameter of quantile normalization of bulk mixture was set to False.

Cell–cell interaction analysis

CellChat v1.4.060 was used to infer communications between cell types through ligand–receptor interaction analysis from single-cell RNAseq data of TI and AC separately. The standard workflow was followed with the normalized data and the annotated cell types as inputs. The built-in database CellChatDB.human involving 1939 interactions was used as a reference to screen potential ligand–receptor interactions. The communication probablility was quantified between cell types having at least 10 cells. The average gene expression per cell type was caculated without trimming.

Development trajectory analysis

RNA velocity v0.655 was applied to infer lineage relationships between epithelial cell populations and predict future transcriptional state of a single cell. First, the loom file including spliced/unspliced matrices was generated from the bam file using Velocyto. Then velocity was calculated by the function RunVelocity in the SeuratWrapper v0.3.0 package with default parameters. The velocity was plotted on the pre-computed UMAP embedding and colored by the annotated cell types. As anticipated, we observed a pronounced directional flow starting from stem cells, progressing through early enterocytes, intermediate enterocytes, and culminating in mature enterocytes (Fig. 4A).

CytoTRACE v0.3.357 was performed to predict stemness status from single-cell RNAseq data based on the assumption that the number of genes expressed in a cell decreases during differentiation. CytoTRACE v0.3.3 was implemented with default parameters and the raw count matrix of each sample as the input. A CytoTRACE score was assigned to each cell based on its differentiation potential, with higher score indicating higher stemness. CytoTRACE scores from different samples were grouped by cell types and score differences between two cell types were compared by Wilcoxon rank-sum test. As anticipated, stem cells exhibit the highest score, indicating undifferentiated status, followed by early and intermediate enterocytes, while mature enterocytes register the lowest score, signifying full differentiation (Fig. 4D).

Partition-based graph abstraction (PAGA) v1.3.356 was used to reconstruct lineage relationships of epithelial cell populations. First, the Seurat object was converted to h5ad file for the PAGA input. Then, a neighborhood graph was computed based on the size of local neighborhood of 50 and the number of PCs of 30 using scanpy. Finally, the connections between cell types were quantified. The connections of weight less than 0.2 were removed.

Cell types associated with CD/IBD-risk Loci

SNPsea algorithm v1.0.363 was used to identify cell types associated with CD/IBD-risk SNPs based on the assumption that genes specificity to a cell type is an indicator of its importance to the cell type function. Thus if one cell type have significant enrichment of specific genes associated with GWAS risk loci, this cell type is highly likely to be pathogenic and critical to the disease. The CD/IBD-risk SNPs were compiled from two GWAS studies5,6, which reported 344 loci in total. A pseudobulk dataset for each cell type with greater than nine cells in CD was generated by summing all UMI counts for each gene in each cell type and adding pseudocount of 1. The data were then normalized by DESeq2 to remove the effects introduced by cell cluster-sizes. SNPsea was run with defult parameters and all genes in a SNP’s linkage interval are accounted when calculating scores. The p-values were further adjusted by the Benjamini-Hochberg multiple testing procedure.

Cell type deconvolution and spatial colocalization

Robust Cell-Type Decomposition (RCTD)62 in the spacexr v2.2.1 package was applied to deconvolve cell type compositions of each spot. Single-cell RNAseq data and cell types annotations from all TI and AC samples were used as the reference to decompose spatial TI and AC samples, respectively. The anchor-based integration workflow in the Seurat v4.1.0 was also used to predict the underlying composition of cell types in each spot and similar results were obtained.

SpaGene v0.1.061 was used to quantify colocalization of markers of epithelial genes (KRT8 and KRT18) and LND (LCN2, NOS2, and DUOX2) with immune cell signatures (CD3D, CD8A, GZMB, MRC1, S100A8, NFKB1A, and NFKB1B). Z-scores and FDR values were generated to estimate the significance of spatial connections of two genes (such as NOS2 and CD8A) compared to random distributions. SpaGene v0.1.0 was also performed to quantify colocalization between epithelial and immune cells based on the inferred composition of each cell type from RCTD62. Z-scores and FDR values were produced to estimate the significance of spatial colocalizations of two cell types (such as LND and Neutrophils) compared to random connections.

Human colon organoids

Colonoscopic biopsies were obtained from patients in these groups: non-IBD controls with normal histology, inactive CD, and active CD under a VUMC IRB protocol. Fresh tissues were used to generate and maintain colon organoids as follows88. Tissues were incubated in 10 mM EDTA in PBS at 4 °C for 30 min. Tissues were then transferred to dissociation buffer (43.5 mM sucrose and 54.9 mM D-sorbitol in PBS) and shaken to release the crypts. After centrifugation, the crypts were resuspended and plated in Matrigel, which was overlaid with IntestiCult™ Organoid Growth Medium (Stemcell Technologies) supplemented with penicillin-streptomycin-gentamicin, 10 μM Y-27632 (Tocris) and 4 μM CHIR 99021 (Tocris). Then, cells were maintained in the same media without gentamicin or Y-27632, for a maximum of 10 passages. For experiments, the organoids were cultured in 24-well plates in Matrigel with IntestiCultTM Organoid Differentiation Medium supplemented with antibiotics with the media changed daily. On day 7, the entire 500 µl of conditioned media was collected from each well.

RNA expression in human colon organoids

Organoids maintained in Matrigel were treated with Cell Recovery SolutionTM (Corning) and pelleted by centrifugation. Total RNA was isolated using the RNeasy Mini kit (Qiagen). For RT-real-time PCR, reverse transcription was first performed using the Superscript IV Reverse Transcriptase kit (Invitrogen) and Oligo dT (Invitrogen); then mRNAs were amplified using the PowerUp SYBR Green Master Mix (Invitrogen)88. The primer sets used were listed in Supplementary Data 6.

TranswellTM monocyte cell migration assays

CD14+ monocytes were isolated from human peripheral blood mononuclear cells (PBMCs) from four healthy donors with the EasySep™ Human CD14 Positive Selection Kit II (STEMCELL Technologies, #17858) per the manufacturer’s protocol. Cells were counted and 100,000 live cells diluted in 200 µL of serum-free RPMI 1640 media (Gibco, #11875093) were added to the apical compartments of 8.0 µm pore size ThinCert™ inserts (Greiner Bio-One, #662638) in the 24-well plate. The basolateral compartments contained 400 µL colonoid conditioned media. Cells were allowed to migrate for 90 min at 37 °C, and then inserts were removed, washed once with PBS, and stained with the Hema 3™ Manual Staining System (Fisher Scientific, #23-123869) per the manufacturer’s protocol. Stained inserts were imaged on a brightfield microscope at ×20 or ×40 magnification. Migrated cells inside the pores of the inserts were manually counted by an observer blinded to the conditioned media source, and were then normalized to total image area in cm2.

The negative binomial mixed model using the glmmTMB package was performed to estimate the difference in the number of migrated cells between control, inactive CD, active CD with low LND, and active CD with high LND. The negative binomial distribution fitted the count data well and the mixed model considered repeated measurements from different fields and different wells within each experiment.

Serum cytokine analysis

Serum cytokines were measured using Luminex technology with Milliplex MAP (Millipore, Billerica, MA) multiplex magnetic bead-based antibody detection kits, which were used according to the manufacturer’s protocol39,67. Serum samples were analyzed with a pre-mixed cytokine analyte kit. Samples were run in duplicate and quality control samples were run across plates and assayed on a FLEXMAP 3D machine39,67.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.