Cells of the human intestinal tract mapped across space and time

The cellular landscape of the human intestinal tract is dynamic throughout life, developing in utero and changing in response to functional requirements and environmental exposures. Here, to comprehensively map cell lineages, we use single-cell RNA sequencing and antigen receptor analysis of almost half a million cells from up to 5 anatomical regions in the developing and up to 11 distinct anatomical regions in the healthy paediatric and adult human gut. This reveals the existence of transcriptionally distinct BEST4 epithelial cells throughout the human intestinal tract. Furthermore, we implicate IgG sensing as a function of intestinal tuft cells. We describe neural cell populations in the developing enteric nervous system, and predict cell-type-specific expression of genes associated with Hirschsprung’s disease. Finally, using a systems approach, we identify key cell players that drive the formation of secondary lymphoid tissue in early human development. We show that these programs are adopted in inflammatory bowel disease to recruit and retain immune cells at the site of inflammation. This catalogue of intestinal cells will provide new insights into cellular programs in development, homeostasis and disease.

The cellular landscape of the human intestinal tract is dynamic throughout life, developing in utero and changing in response to functional requirements and environmental exposures. Here, to comprehensively map cell lineages, we use single-cell RNA sequencing and antigen receptor analysis of almost half a million cells from up to 5 anatomical regions in the developing and up to 11 distinct anatomical regions in the healthy paediatric and adult human gut. This reveals the existence of transcriptionally distinct BEST4 epithelial cells throughout the human intestinal tract. Furthermore, we implicate IgG sensing as a function of intestinal tuft cells. We describe neural cell populations in the developing enteric nervous system, and predict cell-type-specific expression of genes associated with Hirschsprung's disease. Finally, using a systems approach, we identify key cell players that drive the formation of secondary lymphoid tissue in early human development. We show that these programs are adopted in inflammatory bowel disease to recruit and retain immune cells at the site of inflammation. This catalogue of intestinal cells will provide new insights into cellular programs in development, homeostasis and disease.
Intestinal tract physiology relies on the integrated contribution of multiple cell lineages, the relative abundance and cell networking of which fluctuate from embryonic development to adulthood. Further complexity is added because the intestinal tract is formed of distinct anatomical regions that develop at different rates and carry out diverse roles in digestion, nutrient absorption, metabolism and immune regulation.
The analysis of rare fetal tissues has resolved the formation of villicrypt structures and the seeding of immune cells into the gut environment [1][2][3] . Similarly, our understanding of the cellular landscape of the adult gut is benefiting from single-cell technologies. Regional differences in immune-cell activation and microbiome composition in the healthy human colon have previously been reported 4 . Studies that compare inflammatory bowel disease samples to healthy tissues have enabled the identification of disease-relevant stromal 5,6 , tissue-resident CD8 T cell [7][8][9] populations and correlation between cellular response and clinical treatment 10 . Although extensive work has been carried out to profile the intestinal tract at single-cell resolution (Supplementary  Table 1), a holistic analysis of the gut through space (anatomical location) and time (lifespan) is lacking. Building such a developmental roadmap would be invaluable for the scientific community 11 .
Here we create a single-cell census of the healthy human gut, encompassing around 428,000 cells from the small and the large intestines as
BEST4 epithelial cells, which have been previously observed in human small and large intestines 6,12,13 , varied in abundance between intestinal regions (Fig. 1c, d). Using differential cell-type abundance analysis 14 , we identified their region-specific expression signatures (Fig. 1e, Extended Data Fig. 3a-d). Notably, small-intestinal BEST4 cells were marked by high expression of the gene CFTR, which encodes a chloride channel and is mutated in cystic fibrosis (Fig. 1e); such high expression was also observed at the protein level (Extended Data Fig. 3e). In this staining and in previous work, BEST4 cells were in close proximity to cells that resembled goblet cells 12 (Extended Data Fig. 3f). Our analysis highlights a possible role of BEST4 enterocytes of the small intestine in aiding mucus production by goblet cells and biosynthesis of acids, in contrast to the functions of colonic BEST4 cells in the metabolism of small molecules (Extended Data Figs. 3g, 4a, b).

Diversity of intestinal epithelial cells
In the epithelial compartment, secretory cells consisted of goblet, tuft, Paneth and microfold cells, as well as precursor states. Absorptive and goblet cells showed regional separation at all life stages (Fig. 2a, b, Extended Data Figs. 5a-c, 6a, b). Given that the gut epithelium represents an entry point for SARS-CoV-2 15 , we also report that both ACE2 and TMPRSS2-which encodes transmembrane serine protease 2-were expressed by enterocytes in early development (Fig. 2c, Extended Data Fig. 6c).
Subclustering of enteroendocrine cells (EECs) revealed NEU-ROG3-expressing precursor cells enriched in the first-trimester fetal gut, and multiple mature subsets resembling populations described in intestinal organoids 16 (Fig. 2d, Extended Data Fig. 6d). Although neuropeptide W (encoded by NPW) is known to stimulate food intake 17 and is broadly expressed by EECs 16,18 , we found a subpopulation of NPW-expressing enterochromaffin cells that are also specific for PRAC1 and RXFP4 (Extended Data Fig. 4d). We delineated genes involved in the differentiation of NEUROG3 precursors to enterochromaffin cells, including recently described genes (marked by arrows) such as FEV 19,20 (Fig. 2e, Extended Data Fig. 6e, f).
Notably, among the top differentially expressed genes in tuft cells was PLCG2 (Extended Data Fig. 6g, h), a phospholipase that is typically associated with haematopoietic cells. To explore the relevance of PLCG2 in tuft cells, we screened for the expression of upstream receptors (Fig. 2f, Extended Data Fig. 6i). FCGR2A, which is activated in response to IgG and expressed by selected epithelial cells in immunized mice 21 , was specifically expressed by approximately 2.75% of tuft cells (Fig. 2f). We confirmed the expression of the protein FCGR3 (the mouse orthologue of human FCGR2A) by approximately 5% of small intestinal tuft cells in mice (Fig. 2g, Extended Data Fig. 6j). Receptor tyrosine kinases were expressed across tuft cells and other epithelial cell types. Because these are known to be mainly linked to PLCG1 activation, whether they are also responsible for PLCG2 activation in tuft cells is difficult to delineate (Fig. 2h). PLCG2 expression by tuft cells was at higher levels than in B and myeloid lineages, and was confirmed in both in vivo and in vitro models (Extended Data Fig. 7a-f)

Article
with downstream signalling mediators including RAC2, ITPR2, PRKCA and TRPM5 (Fig. 2f, h)-suggested the ability of tuft cells to respond to immune-cell signalling.

Development of the enteric nervous system
Next we investigated the differentiation of neural cells from enteric neural crest cell (ENCC) progenitors (Fig. 3a, b) that were present in the dataset from 6.5 PCW (Extended Data Fig. 8a). ENCCs balance proliferation and differentiation into glia and neurons, while maintaining a progenitor reserve. To capture discrete processes of ENCC differentiation, we analysed early (6-11 PCW) and late (12)(13)(14)(15)(16)(17) development separately (Extended Data Fig. 8b). In early development, ENCCs differentiated primarily to neurons via neuroblasts, giving rise to two distinct branches: branch A (ETV1) and branch B (BNC2) (Fig. 3a Although differentiated neurons were abundant at 6-11 PCW, glial cells were enriched at later development. Three types of enteric neural and a subset of differentiating glia (COL20A1) were present at 12-17 PCW (Fig. 3b, Extended Data Fig. 9a-d). Colonic glia 1 cells expressed posterior HOX genes and TFAP2B, which suggests that they originated in the sacrum or trunk (Extended Data Fig. 8g). We visualized BMP8B-expressing cells in the myenteric plexus, whereas DHH-expressing cells were found both in the mesentery and the myenteric plexus (Fig. 3d, Extended Data Fig. 8f).
To identify neural cells involved in Hirschsprung's disease (HSCR), we screened for the expression of known HSCR-associated genes 25-27 . The majority of HSCR-associated genes were expressed across multiple differentiating populations with varying intensity (Fig. 3e), and varied between neuron branches A and B. For example, RET was highly expressed by branch A, but not by branch B, neurons. Notably, ZEB2 and EDNRB were more highly expressed across colonic glia and neuroblast subsets compared to equivalent small intestinal subsets (Fig. 3e). Any differences in expression between regions might also be due to the developmental lag of the large intestines. In addition, key ligands that are implicated in HSCR-including GDNF, NRTN and EDN3-were primarily expressed by mesothelium, smooth muscle cells and interstitial cells of Cajal (ICC) (Extended Data Fig. 8h).

Formation of secondary lymphoid organs
Gut-associated lymphoid tissues and mLN are key sites of gut immune surveillance. We observed mLN emergence at around 12 PCW, with structures disectable from 15 PCW (Extended Data Fig. 10a)-consistent with previous observations 28 . Interactions between mesenchymal and endothelial lymphoid tissue organizers (mLTo and eLTo, respectively) and lymphoid tissue inducers (LTi) are central to initiating the formation of secondary lymphoid organs 29 . To better understand this process in humans, we assessed our dataset for the key cell types involved.
Sub-clustering of fetal and adult T and innate lymphoid cells revealed three clusters that matched published characteristics of LTi cells (Fig. 4a, b). This included high expression of RORC, KIT, TNF,   TUBB  PRDX1  SOX4  MDK  NPM1  YBX1  STMN1  AFP  TUBA1A  PAX4  SMIM6  COX7A2L  SCG2  C9orf16  MIR7-3HG  GCH1  GC  NEUROD1  RGS2  KCTD12  FEV  CHGA  CRYBA2  SCT  DDC  LGALS4  TPH1  SLC18A1  SLC38A11  CES1  REG4  CHGB  TFF3  TAC1  CDHR3  IGFBP3  LMX1A  TFF1  GCLC  PRAC1  CDKN2A  COL5A2  RXFP4  FXYD2  NPW   Arrows depict summarised scVelo differentiation trajectories. C9orf16 is also known as BBLN. e, Heat map of genes that change along the differentiation trajectory from NEUROG3-expressing progenitors to enterochromaffin cells (red arrow in d). Arrows indicate genes that have known associations with enterochromaffin cell differentiation. f, Dot plot with expression of molecules upstream or downstream of the PLCG2 pathway in tuft cells and pooled absorptive (TA and enterocytes) and secretory (Paneth, goblet and EEC) cells. g, Per cent expression of Fcγ receptor by SiglecF + EpCAM + and SiglecF − EpCAM + cells in wild-type mice determined by flow cytometry (individual points represent biological replicates; n = 4). Using a two-way ANOVA we observe significant interaction between Fcγ receptor expression (F(3, 39) = 42.29, P = 3.05 × 10 −11 ). Post-hoc analysis showed significant differences between non-Tuft epithelial cells and LTA, LTB, IL7R and ITGB7 (beta chain of gut-specific α4β7 integrin) as well as the absence of productive αβ TCR (Extended Data Figs. 10b-d, 11a-c). Innate lymphoid cell progenitors (ILCPs) were transcriptionally comparable to fetal liver ILCPs 30 (Extended Data Fig. 10e), and were found both in fetal mLN and in embryonic gut, whereas NCR + and NCR − type 3 innate lymphoid cells (ILC3s) were expanded across gut regions throughout 6-17 PCW (Extended Data Fig. 10f). This suggests that LTi-like ILC3 subsets are expanded during the development of gut-associated lymphoid tissues, but not during the development of mLN. Single-molecule fluorescence in situ hybridization (smFISH) staining identified all three LTi-like subtypes (Extended Data Fig. 10g) and placed CXCR5 and RORC-expressing LTi cells adjacent to CXCL13-expressing LTo cells in proximal gut mucosa (Fig. 4c), supporting the concept of congregation of these cells in the developing gut. These observations-together with the expression of genes that encode key chemokines (CXCR5, CCR7 and CCR6) and RNA velocity analysis (Extended Data Fig. 10h)-suggests that ILCPs are the first LTi-like cells in the developing gut, and represent a progenitor state to ILC3s.
We observed arterial, venous, capillary and lymphatic endothelial cells (LECs) (Extended Data Figs. 2a, 12a, b). LECs separated into six clusters (labelled LEC1-LEC6; Extended Data Fig. 12c). LEC2 cells expressed TNFRSF9, THY1, CXCL5 and CCL20-as described in human lymph nodes 31 -as well as targets of the NF-κB pathway and adhesion molecules including MADCAM1, VCAM1 and SELE, suggesting their involvement in lymphocyte trafficking (Extended Data Fig. 12d, e). We confirmed that the presence of high endothelial venule structures was required for lymphocyte entry and for proximity of PROX1 + vessels to CXCL13 + mLTo and RORC + LTi cells (Extended Data Fig. 12f, g).
Within the stromal compartment, we identified subtypes of myofibroblast, smooth muscle cells, pericyte, interstitial cells of Cajal, mesothelium and populations resembling stromal cells previously described in the colon 5 (labelled as stromal 1-4) (Fig. 4d, Extended Data Fig. 2a). We further identified fibroblast populations typically defined in mouse lymph nodes 32 , including T reticular cells and follicular dendritic cells (Extended Data Fig. 13a). In the prenatal intestine and mLN, we observed a stromal population-marked by the expression of CCL19, CCL21 and CXCL13 as well as adhesion and NF-κB pathway molecules (Extended Data Fig. 13b-f)-that resembled the mLTo described in mouse lymph nodes 33 . We further determined cell-cell interactions that governed early leukocyte recruitment across LEC2, mLTo and LTi-like cells (Extended Data Fig. 13g, h), and differences in B cell activation status between fetal and adult samples (Extended Data Fig. 14a-l).
To visualize the recruitment of naive immune cell subsets to activated mLTo, we used cell2location 34 to perform spatial mapping of single-cell transcriptomes to 10x Genomics Visium spatial zones in 17 PCW fetal ileums (Fig. 4e, Methods). In this analysis, we captured tissue zones with expression of mLTo marker genes (CCL19, CCL21 and CXCL13) (Extended Data Fig. 15a-c) that were likely to correspond to developing secondary lymphoid organs. We found that LEC2, mLTo and LTi-like subsets mapped to the same tissue zones as naive immune subsets (for example, SELL CD4 T cells, T regulatory cells, immature B cells; tissue zone or fact_11 (Extended Data Fig. 15c)). M cells characteristic of secondary lymphoid organs were present in the adjacent zone (tissue zone or fact_5 (Extended Data Fig. 15c)).
Following our observations, we compared programs of lymphoid organogenesis with the formation of ectopic lymphoid structures that is observed in patients with Crohn's disease 35 . We found that ILC3s in tissues from patients with Crohn's disease matched fetal NCR + ILC3s with more than 60% probability, whereas T reticular cells and stromal 4 cells from patients with Crohn's disease transcriptionally reassembled fetal mLTo (Extended Data Fig. 15d, e) and were expanded in four out of seven Crohn's disease samples (Extended Data Fig. 13b). Finally, from a genome-wide association study across cell types in our dataset, we calculated the enrichment score for genes associated with Crohn's disease and ulcerative colitis (with a false-discovery rate of 10%; Methods). Adult ILC3s and fetal ILCPs and NCR + ILC3s were among the top cells that were enriched for the expression of genes associated with Crohn's disease (Fig. 4f, Extended Data Fig. 15f).

Discussion
Here we present an integrated dataset of more than 428,000 single cells from multiple anatomical regions of the human gut throughout life. This dataset can be browsed at https://gutcellatlas.org.
We found that FCGR2A, which encodes a receptor activated by the Fc fragment of IgG upstream of PLCG2, is expressed by a subset of tuft cells. In the context of development, IgG has been shown to traverse the placenta, and so could provide a potential route for tuft cell activation  Overall, these data suggest a potentially impactful immune-sensing role for intestinal tuft cells, and a topic for further investigation in future. Immune sensing in the intestines also occurs in secondary lymphoid organs, and although there are substantial differences in the size and location of secondary lymphoid organs between species 39 , our understanding of the formation of these structures is mostly derived from animal models 29 . Single-cell studies have begun to elucidate the diversity and activation of immune cells in the developing human gut 3,40,41 . Here we identify cell types in the developing intestines that have transcriptional signatures and signalling pathways matching LTi, mLTo and eLTo (LEC2) cells. In vitro studies have shown the ability of RORC-expressing CD56 + CD127 + IL17A + cells 42 and ILC3s 43 to activate mesenchymal cells. This leads us to propose that all three LTi-like subsets defined in this study act in the initiation of secondary lymphoid organs, but probably at different stages of the process. Future studies using complementary approaches will shed light on these cell states and their tissue architecture. Notably, we also observe two specialized fibroblast populations in paediatric Crohn's disease-follicular dendritic cells and T reticular cells-similar to cells described in mouse lymph nodes 32 , Peyer's patches 44-47 and those expanded in patients with ulcerative colitis 5 as well as patients with Crohn's disease 1,10 .
Overall, our work provides clarity on the complex interplay between intestinal cell types throughout time and space, and has potential implications for disease and for the engineering of in vitro systems.

Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41586-021-03852-1. Tmem  T reg  NK T cell  SELL CD4 T  Activated CD4 T  T FH  ILC3  MAIT cell  MPO mono-neutrophil  SELL CD8 T  TH17  gdT  Activated CD8 T  T   Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Methods
No statistical methods were used to predetermine sample size. The experiments were not randomized and the investigators were not blinded to allocation during experiments and outcome assessment.  Table 2. Samples were collected from 11 distinct locations, including the duodenum (two locations, DUO1 and DUO2, which were pooled in the analysis), jejunum ( JEJ), ileum (two locations, ILE1 and ILE2, which were pooled in the analysis), appendix (APD), caecum (CAE), ascending colon (ACL), transverse colon (TCL), descending colon (DCL), sigmoid colon (SCL), rectum (REC) and mesenteric lymph nodes (mLN). Fresh mucosal intestinal tissue and lymph nodes from the intestinal mesentery were excised within 1 h of circulatory arrest; intestinal tissue was preserved in University of Wisconsin organ-preservation solution (Belzer UW Cold Storage Solution; Bridge to Life) and mLN were stored in saline at 4 °C until processing. Tissue dissociation was conducted within 2 h of tissue retrieval.

Patient samples
Mouse samples C57BL/6 mice were obtained from Jackson Laboratories and maintained in specific-pathogen-free conditions at a Home-Office-approved facility at the University of Cambridge. Female mice aged 10-14 weeks were used; the numbers of mice used are included in the relevant figure legends. All procedures were carried out in accordance with ethical guidelines with the United Kingdom Animals (Scientific Procedures) Act of 1986 and approved by The University of Cambridge Animal Welfare and Ethical Review Body.

Isolation of intestinal cells from fetal tissue
The fetal gut mesentery was cut to lengthen out the tissue and the gut was dissected into proximal ileum (PIL), middle ileum (MIL), terminal ileum (TIL), colon or large intestine (LI) and appendix. Samples, except appendix, were washed twice with Hanks Buffered Saline Solution (HBSS; Sigma-Aldrich, 55021C) and minced into pieces using a scalpel. The samples were incubated in 2 ml HBSS solution containing 0.21 mg ml −1 Liberase TL (Roche, 5401020001) or DH (Roche, 5401089001) and 70 U ml −1 hyaluronidase (Merck, 385931-25KU) for up to 50 min at 37 °C, shaking every 5 min, and homogenized every 15 min using a pipette. The single cells were passed through a 40-100 µm sieve and spun down at 400g at 4 °C for 10 min. Red cell lysis solution (eBioscience10X RBC Lysis Buffer (Multi-species)) was used according to the manufacturer's guidelines to remove red blood cells, and the remaining cells were collected in FACS buffer (1% (v/v) FBS in PBS) by centrifugation at 400g at 4 °C for 5 min. All gut region samples (except mLN) proceeded to enrichment by magnetic-activated cell sorting (MACS).

Organoid culture
Intestinal organoids from paediatric patients were cultured in Matrigel (Corning). During organoid culture, the medium was replaced every 48-72 h. Organoids were passaged using mechanical disruption with a P1000 pipette and re-seeded in fresh growth-factor-reduced Matrigel (Corning). When comparing culture media, multiple wells were seeded from a single dissociated sample. The organoids were then allowed to grow for 5 days followed by 24 h treatment with recombinant human protein TNF (H8916, Sigma Aldrich) at 40 ng ml −1 or IFNγ (PHC4031, Life Technologies) at 20 ng ml −1 . Organoids were in vitro differentiated for 4 days by culturing in a differentiation medium 49 and then collected for RNA extraction. Bright-field images were taken using an EVOS FL system (Life Technologies).
Processing for single-cell sequencing analysis was performed by removing the organoids from Matrigel at passage 3-4 using incubation with Cell Recovery Solution at 4 °C for 20 min, pelleting the cells, and re-suspending in TrypLE enzyme solution (Thermo Fisher) for incubation at 37 °C for 10 min. Cells were pelleted again and re-suspended in DMEM/F12.
Statistical analysis was performed using GraphPadPrism 7 software (GraphPad Software) by multiple t-test analysis. and Python (v.3) were used to pool single-cell counts and for downstream analyses. Single-cell transcript counts for fetal and adult samples were handled separately to control for anticipated differences in cell expression and sample quality. For each run, we apply the SoupX algorithm 50 with default parameters and function adjustCounts() to remove ambient mRNA from the count matrix. Cells for each dataset were filtered for more than 500 genes and less than 50% mitochondrial reads, and genes were filtered for expression in more than 3 cells. A Scrublet (v.0.2.1) score cut-off of 0.25 was applied to assist with doublet exclusion. Additional doublet exclusion was performed throughout downstream processing based on unexpected co-expression of canonical markers such as CD3D (component of the TCR) and EpCAM. Gene expression for each cell was normalized and log-transformed. Cell cycle score was calculated using the expression of 97 cell cycle genes listed in ref. 51 . Cell cycle genes were then removed for initial clustering. Cell cycle score, the percentage of mitochondrial reads and unique molecular identifiers (UMIs) were regressed before scaling the data.

Cell-type annotation
Batch correction of fetal and adult datasets was performed with bbknn (v.1.3.9, neighbours=2-3, metric='euclidean', n_pcs=30-50, batch_key='donor_id' or 'batch'). Dimensionality reduction and Leiden clustering (resolution 0.3-1.5) was carried out and cell lineages were annotated on the basis of algorithmically defined marker gene expression for each cluster (sc.tl.rank_genes_groups, method='wilcoxon'). Cell lineages were then subclustered and batch correction and Leiden clustering were repeated for annotation of cell types and states. Annotated fetal and adult datasets were then merged and annotations adjusted for concordance. A brief description of cell-type annotation for each lineage is provided below. Fetal-specific subsets included proximal progenitors (FGG, BEX5) as described in refs. 1-3,52 , distal progenitors enriched for colon genes (CKB, AKAP7, GPC3) and CLDN10 cells that are possibly pancreatic progenitors based on expression of DLK1, PDX1, RBPJ, CPA1 and SOX9 53 . This population also highly expressed CLU, a marker for intestinal revival stem cells that has previously been described 54 .
Pericytes were identified on the basis of expression of NOTCH3, MCAM (CD146) and RGS5. Immature pericytes were annotated based on high expression of PDGFRB, CSPG4 (encoding NG2) 58 and was marked by high NDUFA4L2 expression. We further annotate contractile pericytes (ACTA2 hi ) that were specifically marked by expression of PLN, RERGL, KCNA5, KCNAB1, NRIP2. Angiogenic pericytes were annotated based on high expression of PRRX1 and PROCR among pericyte populations (also expressed in stromal cells) as described in ref. 3 , and also more specifically marked by ENPEP, ABCC8, COL25A1 and TEX41. We also observe a population of pericytes marked by CD36 60 (named 'Pericyte').
Mesothelial cells were defined on the basis of KRT19, LRRN4 and UPK3B expression, as observed previously 1,61 , and additionally observe the expression of TNNT1, RASSF7 and KLK11 in these cells. Mature pericytes captured in adult tissues highly expressed PRG4, MT1F, MT1G, CP4BPA, and HAS1. In addition, we observe a population of mesothelial cells expressing RGS5, TMEM235 and SERPINE3. ) and a population of memory B cells that specifically expressed transmembrane immunoregulatory molecule FCRL4 (encodes protein also known as FcRH4) that has been suggested to be tissue-restricted 62 Class switch IgA and IgG plasma cells showed expression of the syndecan, SDC1, and immunoglobulin heavy chains A and G, respectively. Cycling B cells were characterized by specific expression of MKI67 (encodes Ki67) and genes involved in DNA replication, HMGB2, TUBA1B and UBE2C.
T lineage cells were determined as the cluster of fetal, paediatric and adult cells with expression of T cell receptor component CD3D and subpopulations were annotated as previously described 4 . CD4 T cells had expression of CD4, but not CD8A, and vice versa for CD8 T cells. For each of these T cell groups, a population of paediatric/adult SELL (encodes CD62L) naive/central memory cells and CD69 activated cells was assigned. 'Activated T' are fetal cells that have productive TCRαβ chains as determined by paired V(D)J sequencing. In addition, among postnatal CD4 T cells are CXCR3,IFNG T H 1 and RORA hi ,IL22, CD20 T H 17 cells. Paediatric CD CD4 T cells labelled as T H 1 and T H 17 showed additional co-expression of IL22 and IL26, matching previously reported expression of these molecules in dysfunctional CD8 T cells in ulcerative colitis 8 . T regulatory cells were defined by FOXP3, CTLA4, TIGIT expression and T follicular helper cells expressed CXCR5 and PDCD1. Two subsets of CD8 T memory cells were observed: CD8 T mem and CX3CR1-expressing CD8 T mem cells that both expressed FGFBP2, S1PR5, FCGR3A and TGFBR3.
Adult gdT subsets differentially expressed gamma and delta variable chain genes. Paediatric gdT (TRDC) cells did not have TCR sequencing data and did not express specific variable chains (Extended Data Fig. 2a). Specific expression of TRGV2, TGVG4 and TRVG5/TRVG7, which each encode a variable chain of the gamma TCR chain, further defined subpopulations respectively. We also observe a population of TRDV2/ TRGV9 gdT cells 63 in the fetal gut. NK T cells expressed CD3D as well as NK genes GZMA, NKG7 and PRF1. MAIT cells expressed TRAV1, TRAV2 and SLC4A10. ILC2 expressed PTGDR2, HPGDS, IL1RL1 and KRT1 64 and were found in the fetal samples.

Intestinal organoid analysis
Single-cell count matrices from three organoid growth conditions were combined together using Pandas (v.1.1.2) and NumPy (v.0.25.2) packages. Cells with fewer than 8,000 genes and with less than 20% mitochondrial reads were included in the analysis. Genes with expression in fewer than 3 cells were also excluded. For PLCG2 expression comparison we use normalized (sc.pp.normalize_per_cell) and log-transformed (sc.pp.log1p) counts. The data were plotted using Seaborn package bar plot and swarmplot functions (v.0.11.0).

Pre-processing and analysis of Smart-seq2 sequencing data
Cells with more than 6,000 genes and greater than 25% mitochondrial reads were excluded, before regression of 'n_counts', 'percent_mito' and 'G2M_score'. Cells positive for PTPRC expression (logTPM+1 > =0.2) were taken forward for downstream analysis. T cell receptor sequences generated using the Smart-seq2 scRNA-seq protocol were reconstructed using the TraCeR software (https://hub.docker.com/r/teichlab/tracer/) as described previously 71 .

Differential cell-state abundance analysis for BEST4 cells
To identify region-specific subpopulations, we performed compositional analysis between BEST4 enterocytes from small and large intestine tissue, using a tool for differential abundance testing on k-nearest neighbour (KNN) graph neighbourhoods, implemented in the R package miloR (v.0.99.8) https://github.com/MarioniLab/miloR 14 .
In brief, we performed PCA dimensionality reduction and KNN graph embedding on the BEST4 enterocytes. We define a neighbourhood as the group of cells that are connected to a sampled cell by an edge in the KNN graph. Cells are sampled for neighbourhood construction using the algorithm proposed previously 72 . For each neighbourhood we then perform hypothesis testing between conditions to identify differentially abundant cell states whilst controlling the FDR across the graph neighbourhoods.
We test for differences in abundance between the cells from small and large intestine tissue in adult samples and fetal samples. To identify markers of small-intestine-specific and large-intestine-specific subpopulations, we performed differential gene expression (DGE) analysis between adult cells in neighbourhoods enriched for small intestine cells and in neighbourhoods enriched for large intestine cells (10% FDR for the differential abundance test). The DGE test was performed using a linear model implemented in the package limma 73 (v.3.46.0), using 10% FDR, and aggregating expression profiles by sample(implemented in the function findNhoodGroupMarkers of the miloR package, with option aggregateSamples = TRUE). Gene Ontology enrichment analysis was performed using the R package clusterProfiler (v.3.18.1).

RNA velocity and diffusion map pseudotime analyses
For neural cell trajectory analysis we use scVelo 0.21 package implementation in Scanpy 1.5.1 74 . The data were sub-clustered on fetal neural cells and pre-processed using functions for detection of minimum number of counts, filtering and normalization using scv.pp.filter_and_normalize and followed by scv.pp.moments function. The gene-specific velocities were obtained using scv.tl.velocity(mode = 'stochastic') and scv.tl.velocity_graph() by fitting a ratio between unspliced and spliced mRNA abundances. The gene specific velocities were visualized using scv.pl.velocity_graph() or scv.pl.velocity_embedding_grid() functions. To visualize genes that change along the pseudotime we use sc.pl.paga_path() function with pseudotime set to monocle3 pseudotime. This function required calculation of PAGA parameters and dpt_pseudotime with functions as follows: sc.tl.paga(), sc.pl.paga(), sc.tl.draw_graph(init_pos = 'paga'), sc.tl.dpt() 75 .

BCR analysis
Single-cell BCR analyses were performed as described previously 76 . In brief, poor quality or incomplete V(D)J contig sequences were discarded and all IgH sequences for each donor were combined together. IgH sequences were annotated with IgBLAST 77 before isotype reassignment using AssignGenes.py (pRESTO 78 ). Ambiguous V gene calls were corrected using TIgGER v.03.1 79 before identifying clonally related sequences with DefineClones.py (ChangeO v.0.4.5 79 ) using a threshold of 0.2 for nearest-neighbour distances. The germline IgH sequence for each clonal family was determined using CreateGermlines.py (ChangeO v.0.4.5) followed by using observedMutations (Shazam v.0.1.11 79 ) to calculate somatic hypermutation frequencies for individual sequences. Finally, for integration with the single-cell gene expression object, the number of high quality and annotated contigs per Ig chain (IgH, IgK, IgL) was determined for each cell barcode. If multiple unique sequences for a given chain were detected, that cell was annotated as 'Multi' and not considered in further analysis. BCR metadata was combined with the scRNA object for downstream analysis and comparison of different B cell populations.

Cell-cell communication analysis
To infer cell-cell communication and screen for ligands and receptors involved we applied the CellPhoneDB v.2.0 Python package 80,81 on the normalized raw counts and fine cell-type annotations from the second trimester intestinal samples (12)(13)(14)(15)(16)(17). We use default parameters and set subsetting to 5,000 cells. To identify the most relevant interactions, we subset specific interactions on the basis of the ligand/receptor expression in more than 10% of cells within a cluster and where the log 2 mean expression of the pair is greater than 0. The selected interactions were plotted as expression of both ligand and receptor in relevant cell types.
Gene expression linked to enteric nervous system disease HSCR-related genes were curated from The Human Phenotype Ontology website (https://hpo.jax.org/app/, Aganglionic megacolon HP:0002251) and ref. 25 . We selected genes with expression greater than or equal to 0.1 in neural lineage single cells and calculated mean expression per cluster and organ. Expression was visualized using seaborn.clustermap() function (v.0.11.0).

Cell-type composition analysis using metadata
The number of cells for each sample (n = 159 samples in total with complete metadata) and coarse-grain cell type (9 different cell types in total) combination was modelled with a generalized linear mixed model with a Poisson outcome. The five clinical factors (age group, donor, biopsy, disease and gender) and the three technical factors (fraction, enzyme and 10x kit) were fitted as random effects to overcome the collinearity among the factors. The effect of each clinical or technical factor on cell-type composition was estimated by the interaction term with the cell type. The 'glmer' function in the lme4 package implemented on R was used to fit the model. The standard error of the standard deviation parameter for each factor was estimated using the numDeriv package. The effect size of each level of a clinical or technical factor was obtained by the posterior mean of the corresponding random effect coefficient and the local true sign rate (LTSR) was calculated from the posterior distribution. See Supplementary Note Section 1 for more details.

Cell-type enrichment analysis for IBD-GWAS genes
Inflammatory bowel disease genome-wide association study (IBD GWAS) summary statistics of Crohn's disease and ulcerative colitis were obtained from the International Inflammatory Bowel Disease Genetics Consortium (IIBDGC) (https://www.ibdgenetics.org/). The GWAS enrichment analysis of 103 annotated gut cell types for Crohn's disease and ulcerative colitis was performed using a fGWAS approach 82,83 , often used for fine-mapping and enrichment analysis of various functional annotations for molecular quantitative trait and GWAS loci. The association statistics (log odds ratios and standard errors) were converted into the approximate Bayes factors using the Wakefield approach 84 . A cis-regulatory region of 1 Mb centred at the transcription start site (TSS) was defined for each gene (Ensembl GRCh37 Release 101). The Bayes factors of variants existing in each cis region were weighted and averaged by the prior probability (an exponential function of TSS proximity) estimated from the distance distribution of regulatory interactions 85 . The likelihood of an fGWAS model was given by the ave raged Bayes factors across all genome-wide genes multiplied by the feature-level prior probability obtained from a linear combination of cell-type-specific expression and the averaged expression across all cell types as a baseline expression. The enrichment of each cell type was estimated as the maximum likelihood estimator of the effect size for the cell-type-specific expression. The code of the hierarchical model (https://github.com/natsuhiko/PHM) was utilized for the enrichment analysis. The detailed model derivation is demonstrated in Supplementary Note section 2.
Visium spatial transcriptomics sample preparation 10x Genomics Visium protocol was applied on optimal cutting temperature medium (OCT)-embedded fresh frozen samples. All tissues were sectioned using the Leica CX3050S cryostat and were cut at 10 µm. The samples were selected on the basis of morphology, orientation (based on H&E) and RNA integrity number that was obtained using Agilent2100 Bioanalyzer. Tissue optimization was performed to obtain permeabilization time for fetal tissue (12 min) and after optimization the Visium spatial gene expression protocol from 10X Genomics was performed using Library Preparation slide and following the manufacturer's protocol. After transcript capture, Visium Library Preparation Protocol from 10x Genomics was performed. All images for this process were scanned at 40× on Hamamatsu NanoZoomer S60. Eight cDNA libraries were diluted and pooled to a final concentration of 2.25 nM (200 µl volume) and sequenced on 2× SP flow cells of Illumina NovaSeq 6000.
10x Genomics Visium data processing 10x Genomics Visium spatial sequencing samples were aligned to the human transcriptome GRCh38-3.0.0 reference (consistently with single-cell RNA-seq samples) using 10x Genomics SpaceRanger v.1.2.1 and exonic reads were used to produce mRNA count matrices for each sample. 10x Genomics SpaceRanger was also used to align paired histology images with mRNA capture spot positions in the Visium slide. The paired image was used to determine the average number of nuclei per Visium location in the tissue and used as a hyperparameter in the spatial mapping of cell types.

Spatial mapping of cell types with cell2location
To spatially map developing gut cell types in situ, 10x Genomics Visium mRNA count matrices were integrated with scRNA-seq data using the cell2location method as described in detail previously 34 . In brief, the cell2location model estimates the abundance of each cell population in each location by decomposing mRNA counts in 10x Genomics Visium data using the transcriptional signatures of reference cell types. Reference signatures of 65 cell populations (neuronal and mesenchymal subtypes were grouped together where relevant to simplify interpretation of spatial mapping) from the 12-17 PCW small intestine samples were estimated using a negative binomial regression model provided in the cell2ocation package. We provide spatial cell abundance maps of all 65 cell subsets on GitHub (https://github.com/vitkl/fetal_gut_mapping/) and the distribution of all 65 cell subsets across tissue zones in Extended Data Fig. 9a-c.
Untransformed and unnormalized mRNA counts were used as input to both the regression model for estimating signatures (filtered to 13,904 genes and 124,049 cells) and the cell2location model for estimating spatial abundance of cell populations (filtered to 13,904 genes shared with scRNA-seq, 4,645 locations, 3 experiments analysed jointly).
Cell2location was used with the following settings (all other settings set to default values): training iterations: 40,000; cells per location N = 12, estimated on the basis of comparison with histology image paired with 10x Genomics Visium; cell types per location Â = 8, assum-ing that most cells in a given location are of a different type; co-located cell type groups per location Ŷ = 4.
To identify tissue zones and groups of cell types that belong to them (Extended Data Fig. 15c, dot plot), conventional non-negative matrix factorization (NMF) was applied to cell abundance estimated by cell-2location. The NMF model was trained for a range of factors and tissue zones R = {8,…,35} and the decomposition into 17 factors was selected as a balance between segmenting relevant tissue zones (lymphoid structures, blood vessel types) and over-separating known zones into several distinct factors. NMF weight for each factor and cell type is shown in Extended Data Fig. 15c.
Cryosectioning, single-molecule fluorescence in situ hybridization and confocal imaging Fetal gut tissue was embedded in OCT and frozen on an isopentane-dry ice slurry at −60 °C, and then cryosectioned onto SuperFrost Plus slides at a thickness of 10 µm. Before staining, tissue sections were post-fixed in 4% paraformaldehyde in PBS for 15 min at 4 °C, then dehydrated through a series of 50%, 70%, 100% and 100% ethanol, for 5 min each. Staining with the RNAscope Multiplex Fluorescent Reagent Kit v2 Assay (Advanced Cell Diagnostics, Bio-Techne) was automated using a Leica BOND RX, according to the manufacturers' instructions. After manual pre-treatment, automated processing included epitope retrieval by protease digestion with Protease IV for 30 min prior to RNAscope probe hybridization and channel development with Opal 520, Opal 570, and Opal 650 dyes (Akoya Biosciences). Stained sections were imaged with a Perkin Elmer Opera Phenix High-Content Screening System, in confocal mode with 1 µm z-step size, using a 20× water-immersion objective (NA 0. 16 Flow cytometry validation of Fcgr on mice tuft cells C57BL/6 mice received either normal drinking water or 2% (w/v) 36,000-50,000 MW dextran sodium sulfate (DSS) (MP Biomedicals) to induce colitis. For DSS treatment, mice received DSS water for 5 days followed by 14 days of normal drinking water, and then a final 5 days of 2% (w/v) DSS prior to being culled.
The small intestines of mice were flushed of faecal content with ice-cold PBS, opened longitudinally, cut into 0.5-cm pieces, and washed by vortexing three times with PBS with 10 mM HEPES. Tissue was then incubated with an epithelial stripping solution (RPMI-1640 with 2% (v/v) FCS, 10 mM HEPES, 1 mM DTT and 5 mM EDTA) at 37 °C for two intervals of 20 min to remove epithelial cells. The epithelial fraction was subsequently incubated at 37 °C for 10 min with dispase (0.3 U ml −1 , Sigma-Aldrich) and passed through a 100-µm filter to obtain a single-cell suspension. Cells were blocked for 20 min at 4 °C with 0.5% (v/v) heat-inactivated mouse serum followed by extracellular staining in PBS at 4 °C for 45 min with the following antibodies; EpCAM-FITC (1:400, G8.

Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this paper.

Data availability
The expression data for fetal and adult regions is available on an interactive website: https://www.gutcellatlas.org/. Raw sequencing data are available at ArrayExpress (https://www.ebi.ac.uk/arrayexpress) with accession numbers E-MTAB-9543, E-MTAB-9536, E-MTAB-9532, E-MTAB-9533 and E-MTAB-10386. Previously published first trimester and paediatric data are available at ArrayExpress (E-MTAB-8901) 1 . For the purpose of Open Access, the authors have applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission. Source data are provided with this paper.

Article
Extended Data Fig. 10 | Identification of LTi-cell-like subset. a, Photo images of human intestinal gut and developing lymph nodes (arrows) at 8-17 PCW. Scale bar = 1 cm. b, Heat map of relative expression of key LTi defining and marker genes expressed by LTi-like cell types, NK cells and adult ILC3 as in Fig. 4a. Genes in red are highlighted in the schematic in Fig. 4b. c, Bar plot with productive TCRαβ chain in fetal T and innate lymphoid cell types as determined by V(D)J sequencing paired with scRNA-seq data. d, Dot plot of scaled expression of selected differentially expressed genes in fetal immune subsets from scRNA-seq dataset. e, Dot plot of expression of selected LTi-like genes in fetal liver ILCPs compared to LTi-like subsets in the gut. f, Bar graph showing the relative proportion of cell types among total T and innate lymphocyte population across developmental and adult gut regions and ages. FPIL, fetal proximal ileum; FMIL, fetal middle ileum; FTIL, fetal terminal ileum; FLI, fetal large intestine; FMLN, fetal mesenteric lymph node; DUO, duodenum; JEJ, jejunum; ILE, ileum; APD, appendix; CAE, caecum; ACL, ascending colon; TCL,transverse colon; DCL,descending colon; SCL,sigmoid colon; REC, rectum; MLN, mesenteric lymph node. g, Representative multiplex smFISH staining of fetal ileum tissue at 15 PCW showing three LTi-like subsets: NCR2 + ILC3, IL17A-expressing NCR − ILC3and SCN1B-expressing ILCP cells (Scale bar panels: main = 20 µm, zoom =5 µm, n = 2, biological replicates across regions). h, UMAP visualization of fetal T and innate lymphoid cells (subsetted from Fig. 4a) coloured by cell type and overlaid with RNA velocity arrows. Inset panel shows ILCP and LTi-like NCR + ILC3 cells. Fig. 11 | LTi-like cells in plate based single-cell sequencing data of sorted cells from the human fetal intestine. a, UMAP visualization and feature plots of full-length Smart-seq2 data of flow cytometry-sorted CD45 + cells from second-trimester fetal tissue coloured by intestinal region or Leiden clustering. b, Dot plot of key marker gene expression in T and innate lymphoid cell subsets captured in Smart-seq2 experiment as in a. c, Bar plot of productive TCRαβ and TCRγδ chain in fetal T and innate lymphoid cell types as in a. Fig. 13 | Stromal in the intestinal tract. a, Heat map of top differentially expressed genes between follicular dendritic cells (FDCs) and T reticular cells (TRCs) and related stromal subsets. Each row is a cell. Arrows highlight key genes discussed in the text. b, Bar graph showing the relative proportion of cell types among the total stromal lineage across fetal and adult gut regions (top) and developmental ages (bottom). Unit of age is years unless specified as weeks. Region names are: FPIL, fetal proximal ileum; FMIL, fetal middle ileum; FTIL, fetal terminal ileum; FLI, fetal large intestine; FMLN, fetal mesenteric lymph node; DUO, duodenum; JEJ, jejunum; ILE, ileum; APD, appendix; CAE, caecum; ACL, ascending colon; TCL, transverse colon; DCL, descending colon; SCL, sigmoid colon; REC, rectum; MLN, mesenteric lymph node. c, Dot plot comparing key defining genes expressed across populations in a and fetal mesenchymal lymphoid tissue organiser (mLTo) cells. d, Bar plot showing number of mLTo in scRNA-seq dataset and coloured by gut region. e, UMAP visualization of stromal cells as in Fig. 4d showing co-expression of CXCL13, CCL19, and CCL21. f, Heat map of top differentially expressed genes of mLTo cells between different intestinal regions. Each row is a cell. g, Heat map of mean expression of ligand-receptor pairs in mLTo, LTi-like and LEC subset from scRNA-seq dataset as identified using CellphoneDB. h, Heat maps showing mean expression of curated immune recruitment signal genes by selected fetal stromal, epithelial and endothelial cell types (top) and their receptor expression in the immune cell types of the fetal gut and mLNs. Red arrows in f and h link cognate ligand receptor pairs. Fig. 15 | Ectopic lymphoid tissue formation in paediatric Crohn's disease. a, Expression of mLTo gene markers by spatial coordinates in 10x Genomics Visium data (left) and abundance of mLTo and ILC3 cells as estimated by cell2location 34 (right) across 17 PCW (top) and 13 PCW fetal ileum (bottom). White boxes highlight predicted developing SLO tissue zones. b, Spatial mapping of scRNA-seq data to 10x Genomics Visium data showing estimated abundance (colour intensity) of cell subsets (colour) in fetal terminal ileum from 17 PCW (top), ileum from 13 PCW (bottom). c, Abundances of cell types as identified using non-negative matrix factorization (NMF) in tissue zones from Visium data as in b. Dot plot shows NMF weights of cell types (columns) across NMF factors (rows), which correspond to tissue zones (normalized across factors per cell type by dividing by maximum values). d, Heat map showing mean probability of immune and stromal cell types matching between fetal and Crohn's disease scRNA-seq datasets. e, Heat map with expression of cytokines and chemokines in cells involved in tertiary lymphoid organ development of fetal (black) and functionally related cell types in paediatric Crohn's disease (red). f, Forest plot of top cell types across fetal, paediatric (healthy and IBD), and adult data enriched for expression of genes associated with either Crohn's disease or ulcerative colitis. All cell types in red have FDR < 10%. The number of cells for each sample (n = 159 samples in total with complete metadata) and coarse-grain cell type (9 different cell types in total) combination was modelled with a generalised linear mixed model with a Poisson outcome. Error bars show standard error for each factor as estimated using the numDeriv package.

Corresponding author(s): Kylie James and Sarah A. Teichmann
Last updated by author(s): Jun 1, 2021 Reporting Summary Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency in reporting. For further information on Nature Research policies, see our Editorial Policies and the Editorial Policy Checklist.

Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.

n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted aligner (Smart-seq2 data; version 2.5.1b). Visium spatial transcriptomics samples were aligned using 10x Genomics SpaceRanger (v1.2.1 ). 10X TCR sequences were aligned using 10x Genomics vdj (v.3.1.0 ) and Smartseq2 T cell receptor sequences were determined using TraCeR software (https://hub.docker.com/r/teichlab/tracer/).

Data analysis
Single cell data analysis was performed using Python (version 3), R (3.5. were used for BCR analysis. Pseudotime calculated using scVelo (0.21) and Scanpy (1.5.1). CellPhoneDB (v2.0) was used for ligand-receptor analysis. Cell type enrichment analysis performed using miloR (https://github.com/MarioniLab/miloR). Flow cytometry data was visualised using FlowJo software (Version 10.7.0, Tree Star Inc.).GraphPadPrism 7 software was used for analysing FACS data. Additional custom codes used in this manuscript are available at Github: https://github.com/Teichlab/SpaceTimeGut, https://github.com/vitkl/fetal_gut_mapping/, https://github.com/natsuhiko/PHM. For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.