Main

Fibroblasts populate all tissues, delineate the topography of organs by producing and remodelling extracellular matrix proteins (ECMs)2 and support other tissue-resident cell types1,3. Fibroblasts perform functions associated with their lineage and specialized programs suited to the needs of specific tissue contexts to maintain organ homeostasis. Macrophages achieve generalized function and specialization via a lineage-wide core transcriptomic signature and tissue-specific programming driven by microenvironmental cues4,5,6. It is unclear how fibroblasts execute functions both common to their lineage and required by their organ of residence.

Technologies such as single-cell RNA-sequencing (scRNA-seq) have revealed intra-tissue fibroblast heterogeneity1. Elucidating the inter-tissue population structure of fibroblasts has clinical relevance, as subtypes of fibroblasts drive disease in arthritis7,8,9, cancer10,11,12 and fibrotic indications such as idiopathic pulmonary fibrosis (IPF)2. Emerging paradigms7,10,13,14 suggest that discrete fibroblast subtypes within tissues govern distinct aspects of tissue homeostasis and disease. Understanding whether fibroblast phenotypes across indications are context-specific or more broadly conserved may inform therapeutic approaches. We hypothesized that fibroblast heterogeneity was promoted by tissue type in the steady-state and disease context during perturbation.

Fibroblasts in steady-state mouse tissues

To investigate this hypothesis, we first performed bulk RNA-seq and assay for transposase-accessible chromatin with sequencing (ATAC–seq) on fluorescence-activated cell sorting (FACS)-sorted fibroblasts from multiple mouse tissues15. These data identified regions of open chromatin and transcriptional networks driven by tissue type, similar to recent reports15,16 (Extended Data Figs. 1, 2, Supplementary Tables 1, 2). However, bulk sequencing cannot discriminate gene signatures that represent a single, homogeneous cell population from those that reflect the average of heterogeneous populations. To resolve this issue, we collected mouse scRNA-seq datasets enriched for non-haematopoietic cells from our laboratory and from public repositories. We removed non-fibroblast cells and corrected for cross-laboratory batch effects to produce a fibroblast-specific single-cell atlas composed of 28 datasets across 16 unperturbed tissues (n = 120,583 cells; Fig. 1a, b, Extended Data Fig. 3a–c, Supplementary Table 3). An interactive data browser for the atlases is publicly available (see ‘Data availability’). Notably, our bulk RNA-seq and single-cell data were highly concordant, which indicates that our single-cell analytical approach did not introduce technical bias (Extended Data Fig. 3d).

Fig. 1: Steady-state mouse fibroblast atlas.
figure 1

a, Twenty-eight datasets for steady-state fibroblast atlas from 16 mouse tissues. b, Uniform manifold approximation and projection (UMAP) embedding of 120,583 single cells in the steady-state atlas. Ten clusters identified through graph-based clustering are indicated by colour. c, Heat map of the relative average expression of the most strongly enriched genes for each cluster (log(fold change) of one cluster versus all others, z-score by row) with key genes, functional annotation and tissue distributions listed. MSC, mesenchymal lineage cell; OLC, osteolineage cell. d, Pseudotime(s) visualized using principal curves representing trajectories of fibroblast differentiation across steady-state atlas with Pi16+ cluster set as root.

In the steady-state atlas, ten clusters were identified on the basis of differential gene expression (Fig. 1b, c). The exact number of clusters in analyses of this type is data-driven but still somewhat subjective; some clusters were well-separated from others, while in other instances it is likely that an expression continuum, with intermediate states, exists. We identified more than 200 differentially expressed genes (DEGs) for each cluster and annotated clusters according to the dominant cluster-specific gene: Pi16+, Col15a1+, Ccl19+, Coch+, Comp+, Cxcl12+, Fbln1+, Bmp4+, Npnt+ and Hhip+ (Supplementary Table 4, Extended Data Fig. 3e). Known fibroblast-associated genes showed distinct expression across the clusters, confirming heterogeneity within the fibroblast lineage (Extended Data Fig. 3f). We were able to use tissue-distribution patterns and hallmark genes to ascribe functional identities to most clusters, including Ccl19+ fibroblastic reticular cells (FRCs)3, Coch+ red pulp fibroblasts17, Cxcl12+ mesenchymal stromal cells and osteolineage cells18, Fbln1+ and Bmp4+ intestinal fibroblasts19, Comp+ fibroblasts20, Npnt+ alveolar fibroblasts and Hhip+ peribronchial fibroblasts21 (Fig. 1c, Extended Data Figs. 3g, 4a–p). The specialization of these clusters was reflected in differential enrichment of gene expression in core signalling pathways, including NFκB and TNF in the Ccl19+ cluster and WNT signalling in the Fbln1+ and Bmp4+ clusters19 (Extended Data Fig. 3h).

Notably, nearly all tissues contributed to the Pi16+ and Col15a1+ clusters, which suggests that these clusters are universal (Fig. 1c, Extended Data Fig. 3g). Genes that defined these two clusters differentiated fibroblasts from mesothelial cells in bulk RNA-seq data (Extended Data Fig. 4r–t). DEGs in the Pi16+ cluster (Pi16, Dpp4 and Ly6c1) suggested an identity similar to adventitial stromal cells22, which are found in vascular niches and can produce ECMs and acquire gene expression profiles consistent with specialized fibroblasts23,24. The Col15a1+ cluster exhibited an association with the basement membrane, evidenced by expression of Col4a1, Hspg2 and Col15a1 (Extended Data Fig. 3i, Supplementary Table 4). The ubiquity of the universal Pi16+ and Col15a1+ subtypes across tissues and the elevated level of stemness-associated genes (Cd34 and Ly6a (which encodes SCA1), Extended Data Fig. 3j), led us to investigate the potential for a developmental relationship among clusters. Slingshot lineage inference identified trajectories that emerged from the Pi16+ cluster, passed through the Col15a1+ cluster, and ended at specialized clusters (Fig. 1d).

Collectively, our analysis showed that in steady-state mouse tissues, universal (Pi16+ and Col15a1+) and specialized fibroblast subtypes exist, and that these may be developmentally linked. The roles of universal fibroblasts include ECM secretion, with Col15a1+ universal fibroblasts exhibiting the capacity to secrete basement membrane proteins and the Pi16+ subtype potentially serving as a resource cell that can develop into specialized fibroblasts.

Dpt IRESCreERT2 mouse validates scRNA-seq

To validate our scRNA-seq analysis, we used the surface markers SCA1 and LY6C (encoded by Ly6c1) to distinguish Pi16+ (LY6C+SCA1+) and Col15a1+ (LY6CSCA1+) universal fibroblasts from specialized fibroblasts (LY6CSCA1; Extended Data Figs. 5a, 6a). Flow cytometry revealed that PDGFRα+ fibroblasts could be sorted into these three groups across 11 tissues (Extended Data Fig. 6b, c). This approach confirmed the existence of bona fide universal and specialized phenotypes: expression of Pi16 was enriched in LY6C+SCA1+ fibroblasts, whereas markers of more specialized fibroblasts such as Ccl19 (lymph node) and Npnt (lung) were enriched in LY6C fibroblasts. Expression of dermatopontin (Dpt) was inversely correlated with specialization (Extended Data Fig. 6d, e). Dpt was diffusely expressed across the steady-state fibroblast atlas but was significantly enriched in Pi16+ and Col15a1+ universal fibroblasts, with highest expression in the Pi16+ cluster. The intestine-specific Fbln1+ cluster also showed Dpt expression at a level similar to the Col15a1+ cluster (Extended Data Fig. 6e, Supplementary Table 4). Expression of Dpt and Pi16 RNA in lung and small intestine was assayed histologically using RNAscope. Dpt+Pi16+ cells were enriched near vascular structures in both tissues, as expected22. Dpt+Pi16 cells were observed in the lung parenchyma, in alignment with our predicted distribution of Col15a1+ cells, and at the base of the villi in the small intestine, consistent with a lack of functional specialization19 (Extended Data Fig. 6f).

We generated a genetically modified mouse model for tracking Dpt-expressing cells to test whether Dpt preferentially marked universal fibroblasts. An IresCreERT2 cassette, which requires the oestrogen receptor modulator tamoxifen for activity, was inserted downstream of the stop codon in exon 4 of Dpt. We crossed this strain with the Rosa26LSLYFP mouse line. The resulting DptIresCreERT2Rosa26LSLYFP mouse irreversibly marks DptIresCreERT2-positive cells and their progeny with yellow fluorescent protein (YFP) after tamoxifen administration (Extended Data Fig. 6g).

DptIresCreERT2Rosa26LSLYFP mice were given tamoxifen in their chow for 14 days. In the 11 tissues examined, PDGFRα+ fibroblasts robustly expressed YFP (Fig. 2a, Extended Data Fig. 6h). Recombination efficiency varied across tissues: lymph nodes exhibited the lowest proportion of YFP+ fibroblasts (28.7 ± 5.5% (mean ± s.e.m.)) whereas the heart exhibited the highest (83.7 ± 3.1%; Fig. 2b). Other cell types expressed little or no YFP (Extended Data Fig. 6i, j). FACS-sorted YFP-positive fibroblasts (sorted 14 days after DptIresCreERT2 recombination) expressed higher levels of Pi16 and Col15a1 and lower levels of Ccl19 and Npnt in the lymph node and lung, respectively, than YFP-negative fibroblasts. This suggests that Dpt-expressing cells are transcriptionally distinct from lymph node FRCs and lung alveolar fibroblasts (Extended Data Fig. 6k). YFP+ and YFP fibroblasts expressed equivalent levels of Csf1, suggesting that both Dpt+ and Dpt fibroblasts are involved in macrophage homeostasis25 (Extended Data Fig. 6l).

Fig. 2: DptIRESCreERT2 marks universal fibroblasts.
figure 2

a, Representative gating for YFP (Dpt) in pancreatic fibroblasts (live, EpCAMCD45CD31PDPN+PDGFRα+). Red, DptIRESCreERT2ki/kiRosa26LSLYFPwt/loxP; grey, DptIRESCreERT2wt/wtRosa26LSLYFPwt/loxP mouse. Horizontal bar denotes the percentage of positive cells. b, Quantification of YFP+ (Dpt+) cells. Frequency (blue dots with mean, top x-axis) and cell number per milligram tissue (red boxplots, bottom x-axis). Percentage YFP+ determined by subtracting fluorescence in DptIRESCreERT2ki/kiRosa26LSLYFPwt/loxP mice from fluorescence in DptIRESCreERT2wt/wtRosa26LSLYFPwt/loxP mice. c, YFP expression in LY6C+SCA1+, LY6CSCA1+ and SCA1 fibroblasts from the lymph node in Ccl19YFP mice. d, YFP expression in LY6C+SCA1+, LY6CSCA1+ and SCA1 fibroblasts from lymph node in Grem1CreERT2wt/ki Rosa26LSLYFPwt/loxP mice treated with tamoxifen via intraperitoneal injection for 5 consecutive days, assayed on days 14–16 after first injection. Bars in c, d represent mean. e, YFP expression in LY6C+SCA1+, LY6CSCA1+ and SCA1 fibroblasts from lymph node, omentum, lung, inguinal adipose, pancreas and skin (flank) in DptIRESCreERT2ki/kiRosa26LSLYFPwt/loxP animals. Mean ± s.e.m. n = 4 (b) or 3 (ce) or representative of 4 (a) biologically independent experiments with 4 (b), 7 (c), 10 (d) or 3 (e) mice. In b, centre lines in boxplots show median; box spans interquartile range, whiskers denote10th–90th percentile. Each dot represents one mouse; mice received tamoxifen chow for 14 days and were analysed on days 14–16 (bd). Subcut., subcutaneous adipose.

Source data

The distinction between universal and specialized fibroblasts was also observed at the protein level via YFP expression in Ccl19YFP (ref. 26) and Gremlin1CreERT2Rosa26LSLYFP mice27. Here, the fraction of YFP-expressing cells (denoting an FRC phenotype) increased from LY6C+SCA1+ (Pi16+) fibroblasts to LY6CSCA1+ (Col15a1+) fibroblasts and further to SCA1 (specialized) fibroblasts (Fig. 2c, d, Extended Data Fig. 5b, c). YFP expression in DptIresCreERT2;Rosa26LSLYFP mice showed the opposite trend in lymph node, lung, and other tissues (Fig. 2e, Extended Data Fig. 5d). This evidence suggested that Dpt expression encompassed both of the universal fibroblast subtypes identified by scRNA-seq; therefore, we refer to these as Dpt+Pi16+ and Dpt+Col15a1+ universal fibroblasts.

Fibroblasts in perturbed mouse tissues

We next investigated how fibroblasts are affected by infection, injury, cancer, fibrosis, metabolic changes and arthritis (Fig. 3a). We integrated 17 publicly available scRNA-seq datasets across 13 tissues to generate a perturbed-state fibroblast atlas (n = 99,596 cells; Fig. 3b, c, Extended Data Fig. 7a–c, Supplementary Table 3). This approach showed ten clusters: Pi16+, Col15a1+, Ccl19+, Cxcl12+, Comp+, Npnt+, Hhip+, Adamdec1+, Cxcl5+ and Lrrc15+ (Fig. 3b, c, Extended Data Fig. 7d, Supplementary Table 5). Clusters in the perturbed-state atlas displayed heterogeneous expression of common fibroblast-associated genes (Extended Data Fig. 7e). In nearly all perturbed tissues and for all types of inflammation, some fibroblasts occupied the universal Pi16+ and Col15a1+ clusters, and these clusters expressed the highest levels of Dpt (Extended Data Figs. 7f, 8a–o, r). The Cxcl12+, Ccl19+, Comp+, Npnt+, and Hhip+ clusters were similar to the analogous steady-state clusters with respect to gene expression and tissue distribution (Fig. 3b, c, Extended Data Fig. 7g). Conversely, the Cxcl5+, Adamdec1+ and Lrrc15+ clusters appeared to represent perturbation-specific, activated fibroblast states that were not observed in the steady-state atlas.

Fig. 3: Perturbed-state mouse fibroblast atlas.
figure 3

a, Seventeen datasets for perturbed atlas from 13 perturbed mouse tissues. b, UMAP embedding of 99,596 cells in the perturbed-state atlas. Ten clusters identified through graph-based clustering are indicated by colour. Coloured clusters are universal (bold) or perturbation-specific; grey clusters have steady-state analogues. c, Heat map of the relative average expression of the most strongly enriched genes for each cluster (log(fold change) of one cluster versus all others, z-score by row). d, Representative flow cytometry plots showing frequency of YFP+ cells in LRRC15+ fibroblasts from KPR3070 subcutaneous tumour at day 21 post-inoculation in DptIRESCreERT2ki/kiRosa26LSLYFPwt/loxP mouse (representative of two biologically independent experiments).

Fibroblasts from early muscle injury contributed the majority of cells to the Cxcl5+ cluster and expressed chemokine genes such as Ccl2 and Ccl7 (Fig. 3c, Extended Data Fig. 8j, p). Analysis of co-regulated genes suggested that this cluster was driven by PI3K, TNF and NFκB signalling (Extended Data Fig. 8v). Cells in the Adamdec1+ cluster were from colitis and showed upregulation of Il11 and Grem1 (Extended Data Figs. 7f, 8e, q). Gene expression in these cells was associated with MAPK signalling (Extended Data Fig. 8v). We subjected mice to DSS-induced colitis and found increased expression of Grem1 in lesions as compared to healthy areas of the colon by RNAscope, validating the colitis-specific expression of Grem1 in the mouse perturbed-state atlas (Extended Data Fig. 8w). The Lrrc15+ cluster was composed of cells from arthritis, skin wound, fibrosis and small and large pancreatic ductal adenocarcinoma (PDAC; Extended Data Figs. 7f, 8a–o). This cluster showed high expression of Cthrc1, Acta2, Postn and Adam12, and of collagens, which suggests that these cells represented myofibroblasts (Extended Data Fig. 8p–u). The increase in collagens and TGFβ signalling is consistent with previous reports10 (Extended Data Fig. 8v).

In perturbed tissues, universal Dpt+Pi16+ fibroblasts maintained the highest expression of stemness-associated genes (Extended Data Fig. 8x). Lineage inference identified trajectories from Dpt+Pi16+ through Dpt+Col15a1+ and then on to perturbation-specific, activated Cxcl5+ and Lrrc15+ clusters or the Adamdec1+ cluster (Extended Data Fig. 8y). We tested whether universal fibroblasts give rise to LRRC15+ myofibroblasts using a subcutaneous tumour model in the DptIresCreERT2;Rosa26LSLYFP mouse. We found that 52 ± 7% of LRRC15+ myofibroblasts were YFP+ in DptIresCreERT2ki/ki mice (Fig. 3d, Extended Data Fig. 8z–b′). This indicates that Dpt-expressing cells marked before tumour implantation can differentiate into LRRC15+ myofibroblasts.

Single-cell RNA-seq of human fibroblasts

We hypothesized that mice may exhibit some parity to humans in terms of steady- and perturbed-state fibroblast subtypes. We performed scRNA-seq on tumour and normal adjacent tissue (NAT) samples from three patients with pancreatic cancer and identified two subsets of fibroblasts (n = 21,262 cells; Extended Data Fig. 9a–c, Supplementary Tables 6, 7). Clusters c3 and c8 were annotated as cancer-associated fibroblasts (CAFs) and normal fibroblasts, respectively. Twelve of the 20 most upregulated genes in c8 were significantly upregulated in mouse steady-state Dpt+Pi16+ (DPT, IGFBP5, IGFBP6, C3, APOD) or Dpt+Col15a1+ (CXCL12, SMOC2, C7, FBLN5, MFAP4, LUM, FMO2) clusters (Supplementary Tables 4, 7). These 20 upregulated genes defined our human universal fibroblast expression module. To test whether c8 represented universal fibroblasts in humans, we inferred the abundance of these cells across tissues from the GTEx database (n = 5,961 samples, Extended Data Fig. 9d, e). We found strong (r > 0.5) co-expression of universal module genes in 12 human tissues, including normal pancreas (Extended Data Fig. 9f, g). At the single-cell level, the human universal module was observed in c8 pancreas and human adipose tissue (Extended Data Fig. 9h, i). We next scored samples from 122 patients with pancreatic cancer28 for the human universal module as well as a human fibroblast activation program (20 most enriched genes in c3 versus c8; Extended Data Fig. 10a). There was a strong negative correlation between the universal and activated gene programs (r = −0.54), which suggests that activation in human fibroblasts may be associated with loss of universal fibroblast gene expression (Extended Data Fig. 10b), reminiscent of the lineage relationship we observed in mice.

We investigated whether the transcriptional fibroblast subtypes we observed in humans had mouse orthologues and observed that the human c3 signature was enriched in the mouse Lrrc15+ myofibroblast cluster (Extended Data Fig. 10c). This was not restricted to pancreatic cancer14: fibroblast signatures from human rheumatoid arthritis (RA)7,9, interstitial lung diseases29, IPF30 and ulcerative colitis31 also showed localized gene expression in the mouse Lrrc15+ myofibroblast cluster (Extended Data Fig. 10c, Supplementary Table 8). We found evidence of universal fibroblasts in samples from patients with interstitial lung disease29 and ulcerative colitis31 (Extended Data Fig. 10d). Cells described as inflammatory fibroblasts across human samples and indications predominantly aligned with the universal clusters of the mouse perturbed-state atlas7,9,11 (Extended Data Fig. 10e). We also observed that mouse IL-1+ CAFs10 localized to universal clusters in the mouse perturbed-state atlas (Extended Data Fig. 10f), suggesting that cells described as inflammatory fibroblasts may represent universal fibroblasts that have not undergone sufficient transcriptional change to constitute a new cell state.

Last, we composed a human perturbed-state fibroblast atlas by integrating c3 PDAC CAFs, colon fibroblasts from patients with colitis and lung fibroblasts from individuals with non-small cell lung cancer (NSCLC), IPF or COVID-19 (Fig. 4a). This human perturbed-state atlas (n = 10,355 cells) exhibited six clusters (Fig. 4b, c, Supplementary Table 9). We observed expected clusters, including NPNT+ alveolar fibroblasts21 derived from individuals with lung disease, ADAMDEC1+ and CCL19+31 clusters primarily from colitis samples, and a PI16+ cluster, in which the human universal signature was most enriched (Fig. 4b, c, Extended Data Fig. 10g). We observed two myofibroblast clusters defined by LRRC15+ and COL3A1+ expression. LRRC15+ myofibroblasts were enriched in cells from individuals with pancreatic and lung cancer, whereas the COL3A1+ cluster was enriched in cells from patients with COVID-19 (Fig. 4d). Both myofibroblast subsets expressed high levels of collagens and ECM-modifying genes, including CTHRC1 (Extended Data Fig. 10h, i). The human perturbed-state atlas validated aspects of the mouse perturbed-state atlas, including LRRC15+ myofibroblasts and ADAMDEC1+ fibroblasts, but also suggested that indication-specific stimuli or signal duration may drive additional myofibroblast populations in humans that were not seen in our mouse atlas.

Fig. 4: Perturbed-state human fibroblast atlas.
figure 4

a, Datasets for perturbation atlas from three perturbed human tissues. b, UMAP embedding of 10,355 cells in the perturbed-state human fibroblast atlas. Six clusters identified through graph-based clustering are indicated by colour. c, Heat map of relative average expression of the most strongly enriched genes for each cluster (log(fold change) of one cluster versus all others, z-score by row). d, Relative abundance of each disease in UMAP clusters.

Discussion

Fibroblasts have emerged as nexus cells that define the architecture of tissues, augment the function and positioning of other cell types, and have key roles in many diseases. We initially hypothesized that fibroblasts accomplish these diverse tasks through tissue-specific transcriptional programming. While our initial hypothesis was supported by bulk sequencing modalities, scRNA-seq across mouse tissues and perturbations led to a different conclusion—that the fibroblast lineage was compartmentalized into universal and specialized (steady-state) or activated (perturbed-state) subtypes. It is possible that fibroblast subsets may exhibit additional imprinting by their tissue of residence. Our scRNA-seq observations were supported experimentally by the DptIresCreERT2 knock-in mouse. Our data suggest a paradigm in which Dpt+ universal fibroblasts give rise to distinct subsets of fibroblasts across tissues, enabling this cell lineage to give rise to phenotypes that can execute the myriad functions ascribed to fibroblasts in the steady-state and in disease. In this paradigm, diverse tissues harbour Dpt+ universal fibroblasts that can differentiate into activated fibroblasts during inflammation and, we speculate, into specialized fibroblasts during development. In this way, Dpt+ universal fibroblasts operate as resource cells, providing functional plasticity to the fibroblast lineage.

Our data suggest concordance between fibroblast phenotypes in mice and humans. We confirmed that a subset of fibroblasts in human tissues has transcriptional similarities to mouse universal fibroblasts. Examination of human datasets representing an array of diseases identified mouse fibroblast orthologues, suggesting that our mouse perturbed-state atlas provides a basis for understanding fibroblast subtypes in human disease states. However, we observed a unique myofibroblast population that was enriched in patients with COVID-19, relative to the other human indications we analysed, which suggests that important inter-species or perturbation-specific differences may exist as well.

Our approach to understanding fibroblasts across tissues has identified both broad similarities and essential differences. Exposing the contours of fibroblast gene expression across tissues and activation states may help to clarify fibroblast subtyping and nomenclature. Open questions still remain about the spatial dynamics among fibroblast subtypes, the existence of other subtypes not captured in these datasets, and the structural or immune cells that promote specialization in the steady-state or activation during inflammation. It remains unclear why two universal Dpt+ fibroblast subtypes exist, though we speculate that this may represent a necessary division of labour within the lineage. Overall, the organizing principles described here may enable a deeper understanding of the development, evolution and behaviour of universal, specialized and activated fibroblasts, which may in turn yield dividends for human medicine.

Methods

Mice

Wild-type mice were obtained from Jackson Laboratory (JAX; colony 00064) and maintained at Genentech. Ccl19YFP (ref. 32), Grem1CreERT2 (ref. 27) and Rosa26LSLYFP mice were bred at Genentech. DptIresCreERT2 mice were designed, generated and bred at Genentech. Male and female mice aged 6–12 weeks were used for all studies. For tamoxifen-induced cre expression, mice were injected with 2 mg tamoxifen (Sigma, cat. T5648) diluted in sunflower seed oil (Sigma, cat. 88921) for 5 consecutive days intraperitoneally or were fed chow containing tamoxifen (Envigo, cat. TD.130859). All experiments were performed under protocols approved by the Institutional Animal Care and Use Committee at Genentech. Other genotypes listed in Supplementary Table 3 were not bred in our facility but used in the integrated single-cell RNA-seq fibroblast atlases. No statistical methods were used to predetermine sample size. The experiments were not randomized and the investigators were not blinded to allocation during experiments and outcome assessment.

Mouse tissue digestion and stromal cell isolation or identification by FACS

Tissues were isolated and fibroblasts and mesothelial cells were isolated as previously described15. In brief, tissues were obtained and minced, aside from the LN, omentum (neither minced) and bone (decapped, marrow removed and crushed). To isolate flank skin, hair was shaved, adipose tissue was removed and tissue was minced. Next, tissues were placed in a 15-ml conical tube with 5 ml digestion medium (RPMI + 2% FBS with 100 mg/ml Dispase (Life Tech., cat. 17105041), 100–200 mg/ml collagenase P (Roche, cat. 11249002001), and 50 mg/ml DNase I (Roche, cat. 10104159001)) and agitated. Tubes were placed in a 37 °C water bath for 15 min, and 5-ml fractions were removed and filtered (70 μm) into RPMI supplemented with 2% FCS (VWR) three times. For skin fibroblast scRNA-seq, skin was minced and digested in the enzymatic cocktail described above for 60 min. We used 200mg/ml collagenase P to isolate cells from dense tissues such as the spleen, liver, inguinal adipose, brown adipose and pancreas. After a single-cell suspension was obtained, the cells were layered on top of a 26% optiprep (Sigma, cat. D1556; diluted in phosphate buffered saline (PBS)) gradient in 15-ml conical tubes and spun at 1,500g for 15 min with slow acceleration and the brake off. Cells in suspension were isolated with a transfer pipette. After digestion, the preparations were incubated with Ack for 2–5 min to remove red blood cells.

Cells were labelled with the following monoclonal antibodies purchased from eBioscience, BioLegend, or BD Biosciences at 1:200 for 20–30 min, unless otherwise noted. Prior to cell surface staining with the following fluorescently labelled antibodies, cells were blocked with Fc block (2.4G2; 1:500–1:1,000, cat #553142). Surface staining for experiments was performed as below, unless otherwise noted: CD45 (30-F11, cat. 564279), EPCAM (G8.8, cat. 118216), CD31 (390, cat. 612802 or 102524), PDGFRα (AP5, cat. 135906), PDPN (8.1.1; 1:800, cat. 127410), LY6C (HK1.4; 1:200, cat. 45-5932-82), SCA1 (D7; 1:200, cat. 108131), CD24 (M1/69; 1:200, cat. 612832) or LRRC1510 (1:200, in-house). Live cells were identified by washing after Fc block and incubation with Fixable Viability Dye Violet (Invitrogen, cat. L34955, 1:1,000) before surface staining or incubation with calcein blue (Invitrogen, cat. C1429, 1:1,000) after surface staining. Data were acquired on a Fortessa, Symphony or LSRII (BD Biosciences) and analysed using FlowJo (Tree Star, v9.9.6) or cells were sorted on a Fusion or Aria (BD Biosciences). Data were presented using Prism (Graphpad, v9).

qPCR, RNA extraction and cDNA synthesis

For each qPCR experiment, organs from 3–5 mice were pooled, digested and sorted by FACS. Total RNA was extracted from FACS-sorted cells using a QIACube HT with the RNeasy Plus Mini Kit (QIAGEN, cat. 74134). cDNA were synthesized using the High Capacity cDNA Reverse Transcription Kit (Applied Biosystems, cat. 4368814) with an RNase Inhibitor (Applied Biosystems, cat. N8080119). Each cDNA sample was diluted 1:200–1:300 in RNase-free water before use in qPCR. qPCR was conducted using TaqMan Gene Expression Assay Probes for the genes Pi16 (Mm00470084_m1), Col15a1 (Mm00456551_m1), Dpt (Mm01273496_m1), Npnt (Mm00473794_m1), Ccl19 (Mm00839967_g1), Csf1 (Mm00432686_m1) and Gapdh (Mm99999915_g1). Each TaqMan probe was diluted 1:10 in TaqMan Fast Advanced Master Mix (ThermoFisher, cat. 4444557) to create a TaqMan probe working solution. All qPCR reactions were carried out in a MicroAmp optical 384-well reaction plate. qPCR was performed using the QuantStudio 5 (Applied Biosystems) under the following cycling conditions: 1 cycle of 50 °C for 2 min and 95 °C for 10 min, followed by 40 cycles of 95 °C for 15 s and 60 °C for 1 min. The relative expression values were determined by: 1) subtracting the average Ct value of the gene of interest (GOI) from Gapdh (Gapdh − GOI) for each sample and 2) raising 2 to the power of (Gapdh − GOI). Data were presented using Prism (Graphpad, v9). Data points were called non-detectable if Ct values were over 36 or if replicates had a standard deviation greater than 1.

Mouse scRNA-seq and cell hashing

Cells hashtagged with unique barcoded antibodies (Biolegend) were processed using Chromium Single Cell Gene Expression 3′ v3 Library and Gel Bead Kit following the manufacturer’s instructions (10X Genomics, Cat. PN-1000075). Cells were counted and checked for viability using Vi-CELL XR cell counter (Beckman Coulter), and then injected into microfluidic chips to form Gel Beads-in-Emulsion (GEMs) in the 10X Chromium instrument. Reverse transcription (RT) was performed on the GEMs, and RT products were purified and amplified. DNA from antibody-derived tags (ADTs) was separated from cDNA based on size selection using SPRIselect beads (Beckman Coulter, cat. B23318). Expression libraries and ADT libraries were generated and profiled using the Bioanalyzer High Sensitivity DNA kit (Agilent Technologies, cat. 5067-4626) and quantified with Kapa Library Quantification Kit (Roche, cat. 07960255001). Illumina HiSeq and NovaSeq (Illumina) were used to sequence the libraries.

Tissue processing for histology

The small intestine or colon was removed, flushed with 10% neutral buffered formalin, and then cut into 1-inch-long strips from the duodenum, jejunum, ileum, and colon. The lung was removed and flushed with 10% neutral buffered formalin to inflate the tissue. A suture was then tied to the trachea to ensure the tissue stayed inflated while fixing in 10% neutral buffered formalin for 24 h. After fixation, the samples were transferred to 70% ethanol for 24 h and then processed for paraffin embedding. Once embedded, the blocks were trimmed on a Leica RM-2245 microtome until the centre of tissue was reached. Five-micrometre-thick sections were then collected for downstream analysis.

RNAscope in situ hybridization

In situ hybridization (ISH) was performed using reagents and protocols from Advanced Cell Diagnostics (ACD). After sectioning, the slides were allowed to dry for 24 h before baking in an oven at 60 °C for 1 h. Sections were then rehydrated in two washes of xylene for 5 min each followed by two washes in 100% ethanol for 1 min each. After rehydration, endogenous peroxidase activity was quenched with hydrogen peroxide before the sections were boiled in antigen retrieval buffer (ACD, cat. 322500) for 15 min. After antigen retrieval, the sections were digested with proteinase (ACD, cat. 322330) for 25 min at 40 °C. After digestion, slides were washed twice for 1 min with ISH wash buffer (ACD, cat. 310091) then hybridized with probes of interest (Grem1-C1, cat. 314741; Dpt-C1 probe, 451311; Pi16-C1 and C2 probe; 451311 (C1) and 451318 (C2)) for 2 h at 40 °C. After hybridization, amplification steps were completed according to the ACD protocol. After the final amplification incubation, signal was detected with HRP-conjugated DAB (ACD, cat. 322360 and 322500), counterstained with haematoxylin for 30 s, then baked in an oven at 60 °C for 15 min before mounting with non-aqueous mounting medium.

DSS-induced colitis

We added 3% (w/w) colitis-grade DSS (MP Biosciences, cat. 9011-18-1) to the drinking water for 7 days and then euthanized the animals and processed tissues for histology. The water was weighed daily to determine the average water consumption per cage. Mouse bodyweights were recorded daily and mice that lost 20% or more of their body weights were euthanized following IACUC standards

Generation of Dpt IresCreERT2 knock-in mouse

Homologous recombination and mouse embryonic stem (ES) cell technology33,34,35 were used to generate a genetically modified mouse strain with a DptIresCreERT2 knock-in. A gene-targeting vector was constructed with a 1,928-bp 5′ arm of homology corresponding to GRCm38/mm10 chr1: 164,821,309–164,823,236 and a 1,810-bp arm of 3′ homology corresponding to chr1: 164,823,237–164,825,046. IRES-CREERT2-SV40-FRT-pgk-neo-FRT was inserted immediately after the STOP codon (TAG) of exon 4. The final vector was confirmed by DNA sequencing, linearized and used to target C2 (C57BL/6N) ES cells using standard methods (G418 positive and ganciclovir negative selection)36.

C57BL/6N C2 ES cells37 were electroporated with 20 μg linearized targeting vector DNA and cultured under drug selection essentially as described38. Positive clones were identified using long-range PCR followed by sequence confirmation. Correctly targeted ES cells were subjected to karyotyping. Euploid gene-targeted ES cell clones were treated with Adeno-FLP to remove PGK neomycin, ES cell clones were tested to identify clones with no copies of the PGK neomycin cassette, and the correct sequence of the targeted allele was verified. The presence of the Y chromosome was verified before microinjection into albino BALB/c embryos. Germline transmission was obtained after crossing the resulting chimaeras with C57BL/6N females. Genomic DNA from pups born was screened by long-range PCR to verify the desired gene targeted structure before mouse colony expansion. For genotyping, the following primers were used: 5′-CGCCAGTGGAAGTTCA -3′, 5′-TATAGGAACTTCGCTCGC -3′ and 5′-GTGCTGTGCAAGGAAG-3′ amplified 329-bp wild-type and 278-bp knock-in DNA fragments.

Tumour inoculation

Age-matched 6–10-week-old DptIresCreERT2wt/wtRosa26LSLYFP and DptIresCreERT2ki/kiRosa26LSLYFP mice received tamoxifen injections for 5 days, rested for 13–14 days and were inoculated subcutaneously in the right unilateral flank with 1 × 105 KPR3070 tumour cells suspended in Hanks’s buffered saline solution and phenol-red-free Matrigel (Corning, cat. 356237). Tumours were removed 21 days after implantation.

Human patient information

Pancreatic cancer sample collection was approved by the Ethics Committee of Beijing Cancer Hospital. All patients in this study provided written informed consent.

Human tissue digestion and stromal cell isolation

Samples were obtained and sequenced by Analytical Biosciences. Single cells were dissociated from tumour and adjacent non-cancer tissues as described previously39. In brief, tumours and adjacent non-cancer tissues were cut into approximately 1–2-mm3 pieces in RPMI-1640 medium (Gibco) and enzymatically digested with gentleMACS (Miltenyi) for 60 min on a rotor at 37 °C, according to the manufacturer’s instructions. The dissociated cells were subsequently passed through a 100-μm SmartStrainer and centrifuged at 400g for 5 min. After the supernatant was removed, the pelleted cells were suspended in red blood cell lysis buffer (TIANDZ) and incubated on ice for 1–2 min to lyse red blood cells. After being washed twice with 1 × PBS (Gibco), the cell pellets were re-suspended in sorting buffer (PBS supplemented with 1% fetal bovine serum (FBS, Gibco)).

Single-cell suspensions were stained with antibodies against CD45 and 7AAD for FACS sorting, performed on a BD Aria SORP instrument. Based on FACS analysis, single cells were sorted into 1.5-ml tubes (Eppendorf) and counted manually under the microscope. The concentration of single cell suspensions was adjusted to 500–1,200 cells per μl. Cells were loaded at between 7,000 and 15,000 cells per chip position using the 10X Chromium Single cell 5′ Library, Gel Bead & Multiplex Kit and Chip Kit (10X Genomics, V1.0 barcoding chemistry) according to the manufacturer’s instructions. All the subsequent steps were performed following the standard manufacturer’s protocols. Purified libraries were analysed using an Illumina Hiseq X Ten sequencer with 150-bp paired-end reads.

Mouse bulk RNA-seq analysis

For ex vivo bulk RNA-seq, cells were isolated and stained as described above. Each tissue was represented by 2–3 individual replicates that were each derived by pooling tissues from 3–5 mice and FACS sorting cells directly into Trizol (Invitrogen, cat. 15596026). In some cases, lysed cells from at least three independent experiments were pooled for one replicate. In total, RNA was generated from an average of 35,195 ± 7,357 (mean ± s.e.m.) fibroblasts and 17,318 ± 7,618 mesothelial cells. RNA was isolated as described40 or at Expression Analysis, Inc.

Paired-end RNA-seq libraries were constructed from at least 747 pg of RNA using the SMART-Seq v4 ULTRA Low Input RNA Kit for Sequencing (Takara, cat. 634891) and NexteraXT kits (Illumina, cats. FC-131-1096 and FC-131-2001) for Low Input RNA Kits. Libraries were then sequenced on an Illumina HiSeq yielding, on average, 35 million read pairs (2 × 50 bp) per sample. Reads were aligned to the GENCODE basic mouse transcriptome index (M14) and transcript levels quantified using salmon with parameters --type quasi -k 25. Subsequently, counts were transformed into gene-level counts in R using the tximport (https://bioconductor.org/packages/release/bioc/html/tximport.html) package. Differential expression analysis taking batches into account was carried out on the gene by sample count matrix with DESeq241, using a design of ~0 + condition + batch having a coefficient for each level of condition. For principal component analysis (PCA), log-transformed normalized counts (lengthScaledTPM) were batch corrected using Combat42 and PCA was performed in the space of variable genes (coefficient of variation >0.3). Gene set enrichment analysis (GSEA) using the fgsea method43 was performed on genes ranked by their principal component 1 loadings using the top 20 marker genes for Pi16+ and Col15a1+ clusters from the steady-state fibroblast atlas.

Mouse bulk ATAC–seq analysis

For ex vivo bulk ATAC–seq, cells were isolated and stained as described above. Each tissue was represented by 2–4 individual replicates that were each derived by pooling tissues from 3–5 mice and FACS-sorting fibroblasts. On average, 28,455 cells (± 5,325 (s.e.m.)) were sorted per tissue. These cells were then frozen in Gibco Recovery Cell Culture Freezing Medium (ThermoFisher, cat. 12648010). The cells were then thawed in a 37 °C water bath, pelleted, washed with cold PBS, and tagmented as previously described44, with some modifications45. In brief, cell pellets were resuspended in lysis buffer, pelleted, and tagmented using the enzyme and buffer provided in the Nextera Library Prep Kit (Illumina, cat. FC-121-1031). Tagmented DNA was then purified using the MinElute PCR purification kit (Qiagen, cat. 28004), amplified with 10 cycles of PCR, and purified using Agencourt AMPure SPRI beads (Beckman Coulter, cat. A63882). The resulting material was quantified using the KAPA Library Quantification Kit for Illumina platforms (Roche, 07960255001), and sequenced with PE42 sequencing on the NextSeq 500 sequencer (Illumina), with 42-bp paired-end reads. Library preparation and sequencing was performed by ActiveMotif, Inc.

Reads were aligned to the GRCm38/mm10 build of the mouse genome using GSNAP46 with parameters -M 2 -n 10 -B 2 -i 1 --pairmax-dna = 1000 --terminal-threshold = 1000 --gmap-mode = none --clip-overlap. Read pairs that aligned concordantly and uniquely to a single genomic location were retained for downstream analysis. PCR duplicates were removed using Picard MarkDuplicates (http://broadinstitute.github.io/picard/). Library depth-corrected coverage bigwig files were obtained to visualize the regions of interest.

OCR identification

Open chromatin regions (OCRs) were identified as peaks on individual replicates and pooled samples combining the replicates of a given tissue using MACS247, with parameters macs2 callpeak -f BAM --call-summits --nomodel–shift -95 --extsize 199 --keep-dup all -p 0.1 --call-summits (these choices of the shift and extsize parameters correct for the +5/−4 bp transposase insertion offset). The irreproducible discovery rate (IDR) pipeline48 was used to assess peak concordance between the individual replicates of a given tissue, and these IDR estimates were subsequently appended to the associated pooled peaks. Robust peaks per tissue were defined as pooled peaks that overlapped at least 50% of a peak from at least two individual replicates and that passed an IDR threshold of 0.1. All robust peaks across all tissues were first centred on their summits (summit ± 199 bp) and then those that overlapped mitochondrial and noncanonical chromosomes were removed. Finally, all remaining peaks were merged to obtain the final set of all accessible regions (n = 207,803). Per sample, reads that overlapped each region in the atlas were counted using the bedtools command multiBamCov49. To find tissue-specific OCRs, differential accessibility analysis was conducted on the count matrix using DESeq241, where the accessibility (that is, overlapping read count) of a given region in each tissue was compared against the count for that region in all other tissues. In this setting, the tissue-specific OCRs were defined according to the following criteria: log2(fold change) ≥ 2, adjusted P value ≤ 0.01. In addition, for each tissue a nondifferential/insignificant OCR set was defined according to: −0.585 ≤ log2(fold change) ≤ 0.585, q value > 0.05.

Motif enrichment analysis

For motif enrichment analysis, for each tissue-specific OCR set, an equally sized matched background set was selected on the basis of region length and GC content from among the nondifferential/insignificant OCRs, using MatchIt50. AME51 from the MEME suite was used with default settings to assess the enrichment of a set of 321 position weight matrices (PWMs) from Homer (http://homer.ucsd.edu/homer/) in the tissue-specific OCR sets versus the background sets. Specifically, Fisher’s exact test was used to compare the number of matches to a given PWM in the specific set versus the background set, and to assess statistical significance. Enriched PWMs were reported based on an adjusted P value threshold of 0.05.

ATAC–seq and RNA-seq concordance

To compute the correlation of log2(fold changes) inferred from the ATAC–seq and RNA-seq differential analyses, the ATAC–seq final atlas peaks were assigned to the gene with the closest transcription start site (TSS), using Gencode mouse M14 annotations and a distance threshold of 50 kb. Following the assignment, genes and atlas peaks with absolute log2(fold change) ≥ 1 and q value ≤ 0.05 in a given tissue were used in the correlation calculation.

An additional analysis to infer concordance between ATAC–seq and RNA-seq datasets was the BETA52 analysis, which takes a set of peaks (tissue-specific OCRs from ATAC–seq) and differential gene expression results from RNA-seq. In short, BETA calculates a regulatory potential score based on the number of peaks in a fixed window (100 kb by default) around each gene TSS and ranks the genes on the basis of this score. For each top gene set based on that rank, it calculates the percentage of the total up- and downregulated genes, as well as unregulated background genes, to provide P values for the overall up- or downregulation potential of the whole peak set. BETA was used with parameters -k BSF -g mm10 -n basic --df 0.1, for all pairwise tissue combinations, so for both matching and non-matching tissues.

Mouse scRNA-seq meta-analysis

The steady-state fibroblast atlas was composed of 28 datasets10,18,19,20,21,31,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70 and the perturbed-state fibroblast atlas was composed of 17 datasets10,18,21,31,53,54,55,56,57,58,61,69,70,71. Integrated fibroblast atlases at steady and perturbed states were generated and analysed using the following steps: (1) processing and filtering individual scRNA-seq datasets from healthy and diseased tissues; (2) integrating healthy and diseased datasets separately to generate steady- and perturbed-state atlases; (3) clustering and annotation; and (4) trajectory inference. The aforementioned steps are described in detail in the following sections.

(1) Processing and filtering individual scRNA-seq datasets

Single cell transcriptomics datasets, enriched in non-haematopoietic cells, generated using 10X Genomics and available as processed CellRanger files, were collected from public repositories and in-house lab datasets (Supplementary Table 3). For public datasets where processed files were not made available, we analysed raw data using cellranger count (CellRanger 2.1.0, 10X Genomics) using a custom reference package based on mouse reference genome GRCm38. A total of 32 scRNA-seq datasets representing multiple tissues and perturbations were analysed individually. In order to ensure comparability, for every individual dataset, we retained genes found in the Ensembl mouse (GRCm38) gene model, followed by implementing the Seurat single-cell analysis pipeline (version 3.9.9.9010)72,73 in R (version 4.0.0). Specifically, for each dataset we filtered low quality cells with <500 measured genes and a high percentage of mitochondrial contamination (>~5–20%, depending on the dataset). After filtering, data in each cell were normalized to log(CPM/100+1), the 2,000 most variable genes were identified, and the expression levels of these genes were scaled before performing PCA in variable gene space. Next, 20 principal components were used for graph-based clustering (resolution = 0.1) and UMAP dimensionality reduction was computed. All steps were performed using functions implemented in the Seurat package (NormalizeData, FindVariableFeatures, ScaleData, RunPCA, FindNeighbours, FindClusters, RunUMAP) with default parameters, except where mentioned. Cell clusters marked by the canonical marker gene for immune cells, Ptprc (Cd45), were discarded. All individual datasets devoid of Cd45+ cells were then used for integration to create two main atlases: (1) a steady-state fibroblast atlas comprising data from healthy tissues; and (2) a perturbed-state fibroblast atlas comprising data from diseased and inflamed tissues.

(2) Dataset integration for steady- and perturbed-state atlases

Before dataset integration, we imported the aforementioned filtered, non-processed Seurat objects (not scaled) of healthy and diseased datasets, and determined a common gene space by retaining only those genes that were measured across all datasets (21,087 genes). Next, individual healthy and diseased Seurat objects were merged separately into two different steady- and perturbed-state objects, respectively. Each of these merged objects was normalized (function NormalizeData, method = ‘LogNormalize’, scale.factor = 10,000), and scaled to regress out the stress gene signature (computed using Seurat’s AddModuleScore) of subpopulations affected by tissue dissociation methods74 before we performed PCA for the most variable genes. These processed, merged objects were next used for batch effect correction and integration using Harmony75 (version 1.0). We adjusted the diversity clustering penalty parameter, theta, to 1. We then provided the top 20 harmony dimensions as an input for UMAP and visualized the first two UMAP dimensions at a clustering resolution of 0.1 for the Cd45 atlases. Next, we identified distinct cell types using canonical marker genes such as Sparc, Col3a1, Dcn (fibroblasts), Epcam (epithelial cells), Alb (hepatocytes), Pecam1 (endothelial cells), Msln (mesothelial cells), Rgs5 (mural cells: pericytes), Myh11 (mural cells: smooth muscle cells), Top2a and Mki67 (proliferating cells), and Cd24a (remnant immune cell populations) (Extended Data Figs. 3a–c, 7a–c). The computational pipeline for integration was iterated twice to generate the fibroblast atlases.

Specifically, at each of the following steps non-relevant or unwanted cell types were filtered followed by recalculation of the variable gene space, batch-effect correction and clustering to generate fibroblast-specific atlas for steady- and perturbed state: (1) a Cd45 steady- and perturbed-state atlas comprising 180,922 and 182,609 cells, respectively, and (2) a fibroblast-specific steady- and perturbed-state atlas comprising 120,697 and 99,596 cells, respectively, after selecting for Dpt+Pdgfra+ fibroblast clusters from Cd45 atlases and discarding other stromal cells including endothelial and mesothelial cells, pericytes, smooth muscle cells and clusters with <1,000 cells.

(3) Clustering and annotation of steady- and perturbed-state fibroblast atlases

Each fibroblast atlas was constructed from its Cd45 counterpart, in which the top 40 harmony dimensions were provided as an input for UMAP and visualized at a clustering resolution of 0.2. Differential gene expression was computed using the Seurat function FindAllMarkers using a Wilcoxon rank sum test and corrected for multiple testing using the Bonferroni method.

Gene expression scores were computed using Seurat’s AddModuleScore function, visualized using VlnPlot or DotPlot. To determine markers for specialized or activated clusters relative to universal fibroblasts we used the Seurat function FindMarkers with default parameters. Next, we scored bulk tissue-specific signatures in the steady-state atlas, computed average scores per tissue signature across tissues represented in the steady-state atlas and visualized them using the ComplexHeatmap function Heatmap. We implemented the ClusterMap76 algorithm using default parameters to identify matching steady- and perturbed-state clusters. ClusterMap compares clusters among two or more datasets via hierarchical clustering based on the binary expression patterns of marker genes. Next, similarity scores of matched clusters are determined from the clustering results.

To infer the activity of signalling pathways that govern different fibroblastic cells at steady state and after perturbation, we implemented the Bioconductor package PROGENy (1.10.0)77. For both fibroblast atlases, we implemented the same strategy. First, we down-sampled each atlas using the Seurat function subset with parameters ‘WhichCells(object, downsample, seed = 1)’ followed by implementing the function progeny with default parameters ‘scale = TRUE, organism = “Mouse”, top = 100, perm = 1, return.assay = TRUE’. We then summarized the progeny scores by cell population and visualized them as a heatmap using the function pheatmap.

(4) Pseudotime reconstruction and trajectory inference

Single-cell pseudotime trajectories for both steady and perturbed state maps were computed using the algorithm slingshot (version 1.6.1), which enables computation of lineage structures in a low-dimensional space78. Specifically, slingshot was implemented in the analysis pipeline after dimensionality reduction and clustering of the integrated object. Each fibroblast object was first downsampled before trajectory inference. Pre-computed cell embeddings and clusters from the Seurat pipeline served as an input to the function slingshot (reducedDim = ‘UMAP’, clusterlabels = object$ RNA_snn_res.0.2, start.clus = “Pi16”, extend = 'n', stretch = 0). The start cluster was chosen using prior biological knowledge and the expression of genes such as Cd34 and Ly6a, known markers of progenitor-like cells. The wrapper function slingshot then performed lineage inference by treating clusters as nodes and constructing a minimum spanning tree (MST) between the nodes. Next, lineages or trajectories were defined by ordering clusters via tracing paths through the MST. Finally, individual pseudotime(s) were visualized using principal curves on the full fibroblast object.

Human scRNA-seq meta-analysis

A human perturbed-state fibroblast atlas was constructed using scRNA-seq datasets generated with the 10X technology, from disease indications such as PDAC, idiopathic pulmonary fibrosis(IPF)29, COVID-1979 and ulcerative colitis31 or SMART-seq2 technology in non-small cell lung carcinoma (NSCLC)80. Individual datasets were processed according to the description provided in ‘(1) Processing and filtering individual scRNA-seq datasets’, with the exception that LUM+ fibroblasts were retained per dataset before final integration. Specifically, for COVID-19 we used data from individuals who had died from severe COVID-19-associated pneumonia. Next, we down-sampled the COVID-19 dataset to 2,500 cells, in order to mitigate the effects of over-representation of a dataset due to high cell number contribution. We then aggregated all the datasets, chose a common gene space (~17,000) and implemented the same pipeline as described in the section ‘(2) Dataset integration for steady- and perturbed-state atlases’. We adjusted the diversity clustering penalty parameter, theta, to 1, provided the top 30 harmony dimensions as an input for UMAP, and visualized the first two UMAP dimensions at a clustering resolution of 0.2. Differential gene expression analyses and visualization of markers were performed using Seurat functions described above. Note, in this meta-analysis we used datasets generated using 10X and SMART-seq2 technologies across a common cell type, fibroblasts. This is because a benchmarking study demonstrated the superior performance of Harmony batch-effect correction using single-cell data generated on different platforms for an identical cell type81.

Bioinformatics data processing of human data

For the human pancreatic cancer single-cell data generated in our laboratory, raw sequencing data were transformed into FASTQ format with CellRanger’s (v2.1) mkfastq command, mapped to the human genome (GRCh38), and quantified with CellRanger count using default parameters. Quantified UMI count matrices from each patient were merged in R and analysed with the Seurat package (v 3.1.4). First, cells with <500 measured genes, or <2,700 UMIs, or >10% mitochondrial counts were removed from the dataset. In the resulting filtered dataset, data in each cell were normalized to log(CPM/100 + 1), the 2,000 most variable genes were identified, and the expression levels of these genes were scaled before PCA in the space of the most variable genes. Subsequently, 30 principal components were used for graph-based clustering (resolution = 0.1) and UMAP dimensionality reduction. All steps were performed with the methods implemented in the Seurat package (NormalizeData, FindVariableFeatures, ScaleData, RunPCA, FindNeighbours, FindClusters, RunUMAP) and default parameters, except for parameters mentioned above. Markers for each cluster were identified using the FindAllMarkers function, limiting the maximum number of cells per cluster to 1,000 for runtime improvement. Genes that were differentially expressed between clusters 3 and 8 were detected using the FindMarkers function and default parameters. To map human expression signatures onto the mouse perturbed state map, human gene symbols were translated to their mouse orthologues and an enrichment score for the gene signature was calculated using Seurat’s AddModuleScore function. Gene sets were identified within referenced papers (Supplementary Table 8).

Pseudo-bulk samples for co-expression analysis were generated from the human single-cell dataset using the following strategy: we randomly sampled 10% of cells from the pancreatic cancer single-cell dataset and pooled their reads into a bulk profile, which was subsequently normalized to log2(CPM). Using this strategy, we generated 100 bulk RNA-seq profiles with known proportions of cells from individual single-cell clusters. This allowed us to compare the expression of individual cluster 8 marker genes across pseudo-bulk samples both pairwise between genes and to the known cell type proportion of cluster 8 in the pseudo-bulks. Next, we generated similar bulk samples, but this time excluding cells from cluster 8 in the sampling process. On these samples we again calculated gene-by-gene correlation coefficients for C8 marker genes and compared the distributions of pairwise correlation coefficients to the distributions in the pseudo-bulk containing cells from cluster 8.

GTEx bulk RNA-seq data for normal tissues were obtained as batch-corrected, log-normalized counts from the UCSC Xenabrowser82. Pairwise correlations were visualized with the corrplot (https://cran.r-project.org/web/packages/corrplot/) package. For cross-tissue correlation analyses, only tissues with a median DPT expression >7.5 were considered. In this analysis, the top 20 marker genes for cluster 8 of the single-cell dataset ordered by log(fold change), which were found in fewer than 15% of other cells, were used. For deconvolution of microdissected PDAC stromal samples, raw expression counts per sample (n = 122) were downloaded from GEO (GSE93326). Data were normalized to log2(CPM + 1). Scores for cluster 8- and cluster 3-derived expression signatures (described above) in these bulk samples were calculated based on the average expression of the 20 most upregulated genes from the respective single-cell cluster (ordered by log(fold change), only genes expressed in at most 30% of other cells were considered).

Pseudo-bulk analytical strategy

We first generated 100 pseudo-bulk RNA-seq profiles from our single-cell dataset with varying numbers of cells from individual single-cell clusters (Extended Data Fig. 9d, top). We observed that the expression of marker genes for fibroblast cluster 8 co-varied depending on the number of cells from cluster 8 in the bulks. As a consequence, their expression profiles were strongly correlated, but only if cells from cluster 8 were added to the pseudo-bulk. Leaving cells from cluster 8 out resulted in an extensive drop in gene-wise correlations to close to 0. Therefore, co-expression of a single-cell-derived marker gene set can be used to infer the presence or absence of a particular cell population in bulk RNA-seq.

Projection of human gene sets onto mouse perturbed-state atlas

Gene expression signatures from human scRNA-seq datasets (Supplementary Table 7) corresponding to different fibroblast types were scored on the perturbed state atlas using the Seurat function AddModuleScore. The density of cells with the highest activation score (top 25th percentile for all clusters except in the Comp cluster (where we visualized the top 5th percentile)) was visualized using the function LSD::Heatscatter.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.