A spatially resolved atlas of the human lung characterizes a gland-associated immune niche

Single-cell transcriptomics has allowed unprecedented resolution of cell types/states in the human lung, but their spatial context is less well defined. To (re)define tissue architecture of lung and airways, we profiled five proximal-to-distal locations of healthy human lungs in depth using multi-omic single cell/nuclei and spatial transcriptomics (queryable at lungcellatlas.org). Using computational data integration and analysis, we extend beyond the suspension cell paradigm and discover macro and micro-anatomical tissue compartments including previously unannotated cell types in the epithelial, vascular, stromal and nerve bundle micro-environments. We identify and implicate peribronchial fibroblasts in lung disease. Importantly, we discover and validate a survival niche for IgA plasma cells in the airway submucosal glands (SMG). We show that gland epithelial cells recruit B cells and IgA plasma cells, and promote longevity and antibody secretion locally through expression of CCL28, APRIL and IL-6. This new ‘gland-associated immune niche’ has implications for respiratory health.

Single-cell transcriptomics has allowed unprecedented resolution of cell types/states in the human lung, but their spatial context is less well defined. To (re)define tissue architecture of lung and airways, we profiled five proximal-to-distal locations of healthy human lungs in depth using multi-omic single cell/nuclei and spatial transcriptomics (queryable at lungcellatlas.org). Using computational data integration and analysis, we extend beyond the suspension cell paradigm and discover macro and micro-anatomical tissue compartments including previously unannotated cell types in the epithelial, vascular, stromal and nerve bundle micro-environments. We identify and implicate peribronchial fibroblasts in lung disease. Importantly, we discover and validate a survival niche for IgA plasma cells in the airway submucosal glands (SMG). We show that gland epithelial cells recruit B cells and IgA plasma cells, and promote longevity and antibody secretion locally through expression of CCL28, APRIL and IL-6. This new 'gland-associated immune niche' has implications for respiratory health.
A comprehensive understanding of cells and micro-environments that define lung function is important for reducing the impact of lung diseases, which currently rank third for mortality causes worldwide 1 . In addition to its main role in gas exchange, the lung has an important barrier function. While other mucosal barrier tissues orchestrate adaptive immunity through well-defined mucosa-associated lymphoid tissue (MALT), such secondary lymphoid structures have not been reported in the healthy human lung 2 . The LungMAP and human lung cell atlas (HLCA) consortia 3,4 have harnessed recent advances in single-cell and single-nucleus RNA sequencing (scRNA-seq and snRNA-seq) 5 and Four distinct cell types in airway peripheral nerves. Finally, we identified the following four new clusters relating to airway peripheral nerves: myelinating Schwann cells (mSchwann) (NFASC, NCMAP, MBP and PRX), nonmyelinating Schwann cells (nmSchwann) (NGFR, SCN7A, CHD2, L1CAM and NCAM1) [25][26][27][28] , endoneurial nerve-associated fibroblasts (NAF) (SOX9 and OSR2) 25 and perineurial NAF (SLC2A1 and ITGA6) 25,29 (Extended Data Fig. 5a). nmSchwann and mSchwann cell marker genes were enriched in cell adhesion and myelination gene sets, respectively (Extended Data Fig. 5b,c), with EVX1, a key gene in spinal cord development, identified as a potential regulator of mSchwann cells in the airways. Both mSchwann and nmSchwann expressed peripheral nervous system disease genes (Extended Data Fig. 5d). Localization of these populations in peripheral nerves was validated with bulk RNA-seq across tissues (Extended Data Fig. 5e), Visium ST (Extended Data Fig. 5f), protein staining (Extended Data Fig. 5g-i) and smFISH ( Fig. 2i and Extended Data Fig. 5j). We show perineurial NAFs generated a number of atlases characterizing lung cell types across species, health and disease [6][7][8][9][10]11 .
Current atlases have prioritized parenchyma tissue, with few studies examining the full depth of the airways. Here we carried out deep tissue profiling from deceased organ donors' healthy lungs and airways, allowing characterization of cell types along the proximal to distal axis of the respiratory tree. We use unbiased spatial transcriptomics (ST) approach to contextualize cell types and states within tissue micro-environments in the healthy human lung and airways, adding a key dimension to the HLCA. In total, we sequenced 129,340 single cells and 63,768 single nuclei and performed Visium ST on 20 tissue sections from human trachea, bronchi, and upper and lower parenchyma. These data and CellTypist automated annotation models are available at lungcellatlas.org as a resource for data download, suspension and spatial gene expression analysis, as well as automated annotation of new datasets. Overall, we distinguished 80 cell types and states, including 11 populations not annotated in previous lung atlas studies. Many of these populations express disease-associated genes highlighted by functional genome-wide association studies (fGWAS) analysis. Our in-depth tissue profiling coupled with spatial genomics reconstructs known tissue micro-environments in the lungs and airways at full molecular breadth. Going beyond known units of cellular organization, we identify a previously undefined immune niche for IgA plasma cells at the airway submucosal glands (SMG).

A spatial, multi-omics atlas of human lung and airways
We applied scRNA-seq and snRNA-seq, VDJ-seq and ST to deep tissue samples from five locations across the human lung and airway ( Fig. 1a and Supplementary Tables 1-3 and 9) to capture structures such as cartilage, muscle and the SMG (Extended Data Fig. 1a; 'Methods').
In total, 193,108 cells and nuclei were annotated into broad cell type groups as follows: epithelial, immune, erythroid, endothelial and stromal cells. Cells were annotated according to consensus marker genes and naming from other lung studies including the integrated HLCA 10 and LungMAP 12 (Fig. 1b, Extended Data Fig. 1b and Supplementary Table 4). Using Visium ST on 20 tissue sections from five locations and the cell2location 13 algorithm ( Supplementary Fig. 1), we assessed the spatial distribution of cells in distinct tissue micro-environments (Supplementary Table 5). Essentially, the tool determines which cell types from suspension data in which abundance could explain the mRNA counts observed in the Visium data. As expected, well-described cell types mapped to their known locations such as ciliated epithelial cells to the lumen of the airway surrounded by basal cells and alveolar type 1 (AT1) and 2 (AT2) cells to lung parenchyma (Fig. 1c,d). To examine differential cell composition related to the sampling locations, donors and protocols used, we used a Poisson linear mixed model to identify the contribution of each technical variable (Methods; Fig. 1e and Supplementary Fig. 2). Different dissociation protocols enriched for specific cell type groups but had little effect (less than 1% of the total variance) on gene expression (Extended Data Fig. 1c).
Highlighting our comprehensive approach, we transcriptionally defined chondrocytes in human lungs (ACAN, CHAD, COL9A3, HAPLN1 and CYTL1; Extended Data Fig. 1d,e) and mapped them to airway cartilage (Fig. 1c,d). Chondrocytes were mostly released using single nuclei sequencing from trachea (Fig. 1e, Extended Data Fig. 1e,f and Supplementary Table 7) and were not present at all in the integrated HLCA 10 , demonstrating the utility of our multi-omics, multilocation human lung atlas.
Rare fibroblasts with immune recruiting properties. The sequential clustering of fibroblasts identified 11 distinct fibroblast clusters (Fig. 2a,b and Extended Data Fig. 2a,b). We annotated previously described myofibroblasts, mesothelial, adventitial and alveolar fibroblasts 11  surrounding and endoneurial NAFs alongside Schwann cells within the nerve bundle in human airway samples. In conclusion, we have detected and mapped rare stromal cells of airway peripheral nerves.
In summary, we distinguish cells of the systemic and pulmonary circulation, describe new IR-Ven-Peri cells and further define the relationship between the endothelial and perivascular cells ( Fig. 3h and Supplementary Fig. 3).

Identification of duct cells in airway SMG
In the epithelial compartment, we identified known and rare cell types and transcriptionally define human SMG duct cells ( Fig. 4a and Extended Data Fig. 7a,b,c), enriched in the trachea 32,33 and previously only characterized in mice at the single cell level [34][35][36] . smFISH staining for ALDH1A3, MIA and RARRES1 validated localization at the SMG and distinguished these cells from other epithelial cells ( Fig. 4b and Extended Data Fig. 7e,f). Cell2location distinguished distinct locations of SMG duct cells compared to SMG mucous and serous cells, providing orthogonal evidence of the identification of a new, distinct cell type (Extended Data Fig. 7f). In addition, Velocyto analysis suggested that these cells may lie on a trajectory toward surface epithelial populations (Extended Data Fig. 7g), consistent with the regenerative role of SMG duct cells in mice 37,38 .
We identified markers for cell-cell adhesion (FHOD3 and LAMA1) and nerve synapse signaling (NTRK2 and PLD5) and validated marker genes by smFISH (Extended Data Fig. 7i) and in the HPA (Extended Data Fig. 7j). Interestingly, mouse myoepithelial cells have also been shown to regenerate the surface airway epithelium 37 . However, in humans, myoepithelial cells are not well defined, potentially due to difficulties in dissociating this cell type. Spatially, lung and airway epithelial cells were enriched in their expected manually annotated locations (Fig. 1c,d). Unbiased analysis with cell2location nonnegative matrix factorization (NMF) was able to further distinguish hidden epithelial factors. We found that basal and suprabasal cells colocate separately from apical surface epithelial cells, consistent with their positions at the base of the surface epithelium (Extended Data Fig. 7k and Supplementary Fig. 4). AT1 cells colocalized with capillaries, alveolar macrophages and fibroblasts and were separate from AT2 cells (Extended Data Fig. 7k and Supplementary  Fig. 4). This analysis revealed further spatial heterogeneity beyond manual annotations, enhancing the spatial resolution and highlighting colocating cell types.
Taking advantage of our multilocation data, we compared cells across the five locations (Extended Data Fig. 7c) and observed the expected enrichment of SMG epithelial cells and depletion of club cells in trachea (Extended Data Fig. 7c). Using our pooled snRNA-seq data, to avoid batch effects from location-specific ambient RNA contamination (Fig. 1a, pooling scheme), we also analyzed gene expression signatures.  Using a linear mixed model 39 (Methods), we detected 80 differentially expressed genes in tracheal ciliated cells, including nasopharyngeal carcinoma genes FBXL7, TSHZ2 and RAET1E (Extended Data Fig. 7l) [40][41][42] .
As previously reported, we found reduced ACE2 expression in distal lung ciliated cells, where expression of ACE2 is more relevant in AT2 cells (Extended Data Fig. 7m) 43 .
Overall, we uncover the full complement of SMG epithelial cells along with their spatial contexts in the human SMG.

Immune cells in the lung and airways
Myeloid cells show previously undescribed heterogeneity. We identified all major immune populations ( Fig. 4c and Extended Data Fig. 8a) which were analyzed separately to reveal previously undescribed heterogeneity, especially in myeloid cells. We found known macrophage subsets, including intravascular (expressing LYVE1 and MAF) 44,45 , CXC3CR + airway 44,46-48 , CHIT1 +11,49 and interstitial macrophages 45 . We identified a previously undefined cluster expressing monocyte (CD14) and macrophage markers, termed macro-intermediate (Extended Data  Fig. 8b). Among alveolar macrophages, the following two more clusters appeared: dividing cells (Macro-alv-dividing) and a cluster expressing metallothioneins (Macro-alv-MT), including MT1G, MT1X and MT1F. Metallothioneins have a role in binding and metabolizing metal ions 50 , and in immunity and stress responses 51,52 . Finally, we identified a rare undescribed population of macrophages expressing chemokines, including CXCL8, CCL4 and CCL20, which we named Macro-CCL 45 . The expression of CXCL8 and CCL20 distinguishes this subset from interstitial macrophages which express CCL4. CXCL8 is associated with lung infection, asthma, IPF and COPD 53 and was identified in psoriatic skin macrophages 54 . Fig. 4c and Extended Data Fig. 8c). In the CD4 compartment, we distinguished naive/central memory (CD4-naive/CM), effector memory/ effector (CD4-EM/Effector), regulatory T cell (Treg) and tissue-resident memory (CD4-TRM) cells. Within CD8 cells, we found gamma-delta T cells (γδT), TRMs (CD8-TRM) 55 and two distinct clusters analogous to populations found in the lung in cross tissue analysis 56 : CD8-EM/EMRA and CD8-TRM/EM. The CD8-TRM cells specifically localized to airway epithelium in our spatial data (Extended Data Fig. 8d) 57,58 . NK subsets included NK-CD11d, NK-CD16hi and NK-CD56 bright 59,60 . CD11d + NK cells are activated in response to infection in both mice and humans [61][62][63] , were previously shown in human blood 64 and here for the first time in healthy human lungs.

T and NK cell subsets in the lung and airways. T lymphocytes and natural killer (NK) cells included CD4 T, CD8 T, mucosal-associated invariant T (MAIT), NK, NKT, innate lymphoid cells and their subsets (
T cell receptor (TCR) VDJ-seq data confirmed MAIT cell type annotation (with preferential use of TRAJ33 and TRAV1-2) 65 and showed low clonal expansion in naive and Treg populations compared to memory and effector subsets (Extended Data Fig. 8e). As expected, there was no clonal sharing between individuals, but expanded clones were found in multiple locations of the lung within a given donor (Extended Data Fig. 8f). The T and NK cell proportions displayed distinct donor-to-donor variability compared to myeloid cells (Extended Data Fig. 8g-i), consistent with higher interindividual variability in lymphocytes.
Overall, we define immune cells of the human lung and airway with unprecedented resolution.
Colocalization of IgA plasma cells with the SMG. B cells included naive and memory B cells, IgA and IgG plasma cells, and plasmablasts ( Fig. 4c and Extended Data Fig. 9a,b). These annotations were supported by VDJ-seq B cell receptor (BCR) isotype analysis. IgA, which is important for mucosal immunity 66,67 , was most frequent in the airway samples, while only the third most abundant in the parenchyma ( Fig. 4d and Extended Data Fig. 9c). Distinguishing markers for IgA plasma cells included CCR10 and B cell maturation antigen BCMA (TNFRSF17; Extended Data Fig. 9b), which are important for plasma cell localization and survival, respectively [67][68][69] .
In Visium ST data, IgA plasma cells mapped to the airway SMG, colocalizing with duct, mucous and serous cells, while IgG mapped to immune infiltrates (Fig. 4e). Enrichment of plasma cells (MZB1 + ) at the SMG was confirmed in the HPA (Extended Data Fig. 9d,e), building on a study in the 1970s that first showed IgA plasma cells in human airway SMG 70 . We further distinguished enrichment of IgA plasma cells in the serous glands with cell2location NMF, showing two distinct gland factors, one with SMG serous cells colocalizing more with IgA plasma cells than a second distinct factor with other SMG epithelial cells (Fig. 4f, Extended Data Fig. 9f and Supplementary Fig. 4). This preferential localization of IgA plasma cells was confirmed by manual annotation of gland areas in ST on formalin-fixed paraffin-embedded (FFPE) preserved tissue samples, which allowed better distinction of serous and mucous glands (Extended Data Fig. 9g).
To dissect this niche at single-cell resolution, we used multiplex IHC to confirm the presence of IgA2 but the absence of IgG cells in the SMG ( Fig. 4g and Extended Data Fig. 9h), consistent with Visium ST (Fig. 4e,f). We also detected IgD + naive B cells and CD3 + CD4 + T helper cells in the human SMG (Fig. 4g). We hypothesize that together these different cell types constitute an immune niche with relevance in disease, which we term the gland-associated immune niche (GAIN). Mucosal IgA is important for protection against respiratory infections 2 , and we found that proportions of IgA plasma cells were increased in coronavirus disease 2019 (COVID-19) patients versus healthy controls in single-cell data from published nasal, tracheal and bronchial brush samples (Methods) (Fig. 4h) 71 . In addition, increased plasma cell numbers have been shown in smokers 70 , patients with cystic fibrosis 72 , COPD 73 and Kawasaki disease 74 , warranting further study of the GAIN in these conditions. In C57/BL6 mouse tracheal sections, we did not identify IgA + cells in the SMG of two independent cohorts of mice despite IgA + staining in the colon as expected 75 (Extended Data Fig. 9i), suggesting that the GAIN should be studied in humans.  Cell-cell interactions and the SMG immune cell niche. To understand colocalization of B cells, IgA plasma cells and T cells in the SMG ( Fig. 4e-g), we explored the molecular mechanisms underpinning the GAIN. Expression of PIGR, which transcytoses polymeric Ig across the surface epithelium, was high across SMG epithelial cells, as was CCL28, known to recruit IgA plasma cells through CCR10 (Fig. 5a-d  Extended Data Fig. 10a) 68,76 . We confirmed expression of CCL28 in SMG by smFISH, and at the protein level by IHC (Fig. 5e and Extended Data Fig. 10b), and observed a gradient of expression along the proximal to distal axis, where CCL28 is highest in SMG duct and serous cells of the trachea (Extended Data Fig. 10b,c). This gradient in serous cells was statistically significant (P < 0.05) with Spearman's two-tailed rank    (Fig. 5b-d).
In addition to immune cell recruitment, we explored signals supporting B and plasma cell function in the GAIN. A proliferation-inducing ligand (APRIL), a factor important for B cell survival, differentiation and class switching, was expressed by SMG duct/serous cells interacting with the receptors TACI and BCMA on B cells ( Fig. 5f and Extended Data Fig. 10a). smFISH for APRIL in tissue confirmed expression in glands, especially in serous cells (LPO + RARRES1 − APRIL high ), confirming the specific B and IgA colocalization from our ST analysis in the APRIL high serous glands (Fig. 5g and Extended Data Fig. 10d,e). Interestingly, APRIL expression can be induced on intestinal epithelial cells leading to IgA2 class-switch recombination (CSR) in the local tissue environment 81 . We found activation-induced cytidine deaminase (AICDA) expression in a few B memory cells, suggesting the possibility of local CSR at the SMG (Extended Data Fig. 10f).
In combination with APRIL, IL-6 induces and supports long-lived plasma cells, potently induces IgA secretion 82,83 and increases IgA secretion in COPD 84 . In our data, SMG duct/SMG serous cells expressed IL-6 ( Fig. 5g), which was predicted to interact with IL-6R/IL-6ST on B plasma and CD4-naive/CD4-CM T cells ( Fig. 5f and Extended Data Fig. 10a). IL-6 has been shown as a required factor for CD4 T cell memory formation and for overcoming Treg mediated suppression 85 . Salivary gland epithelial cells are known to induce B cell responses via IL-6 both directly (T cell-independent) and via T cell-dependent mechanisms 86 . IL-6 is also upregulated in serum and bronchoalveolar lavage fluid in asthma and COPD patients, suggesting the importance of GAIN in disease 87-89 . CellChat also predicted interactions between HLA genes expressed by SMG epithelial cells, and CD4 T cells (Fig. 5h,i) that indicate antigen presentation directly by the SMG epithelial cells. The expression of HLA-DRA and HLA-DRB1 in SMG duct/SMG serous cells was comparable to ciliated cells at the RNA level (Fig. 5i), but much higher at the protein level (Fig. 5j). Similarly, the costimulatory gene CD40 was expressed in SMG epithelial cells (Fig. 5i). CD4 T cells also localized to HLA-DR high nonmucous regions of glands ( Fig. 5k and Extended Data Fig. 10g,h). CD4 T cells in the glands were CD45RO + (memory) cells, and could be seen closely interacting with HLA-DR high SMG epithelial cells, suggesting direct cell-cell contact ( Fig. 5k and Extended Data Fig. 10i-k). Overall, our data suggest that SMG Serous/duct cells can present antigen to CD4 T cells, similar to airway and nasal epithelial cells, which can promote T cell proliferation in vitro [90][91][92][93][94] . Antigen presentation by SGP low MHCII high epithelial cells in the parenchyma of mice has been shown to regulate CD4-TRM responses, contributing to immune homeostasis 95 . MHCII high SMG epithelial cells may have a similar function in the airways.
In conclusion, we identified the colocalization of IgA plasma cells, naive/memory B cells and T cells at the serous glands and described molecular signaling pathways for the recruitment and maintenance of immune cells at the SMG. The described pathways are functional in secondary lymphoid structures such as MALT, and we now suggest they can establish the GAIN (Fig. 6) of the human airways.

Discussion
By integrating scRNA-seq/snRNA-seq with ST, we provide fine-grained resolution of 80 cell types/states in the human lung and airways. Eleven previously unannotated cell types/states were identified and mapped to distinct micro-anatomical tissue environments. Our data have contributed to the HLCA 10 , which provides means for assessing intraindividual variation and effects of clinical covariates. Our in-depth study reveals airway tissue niches encompassing previously unresolved cell types and their interactions, as well as unexpected properties of cell signaling relationships. We provide transcriptomic profiles of human airway

Fig. 6 | Schematic of the human airway GAIN. Schematic of the GAIN showing immune cell recruitment and extravasation facilitated by venous endothelial cells and IR-Ven-Peri (immune recruiting venous perivascular cells) and signaling patterns between SMG epithelial cells, CD4 T cells, B naive/B memory cells and B plasma cells to attract immune cells and promote antigen-specific T celldependent and T cell-independent pathways, leading to IgA secretion at the SMG.
Article https://doi.org/10.1038/s41588-022-01243-4 chondrocytes, cells of the peripheral nerve bundles, SMG duct cells, enhanced resolution in fibroblast, macrophage and lymphocyte subsets and distinguish between pulmonary versus systemic vasculature and pericytes. We highlight potential disease associations for some of these new populations, such as PB-fibro for COPD and IPF. Finally, we discover the GAIN with likely relevance in inflammatory and infectious diseases. We present these data as a resource to the community as open-access downloadable files and through our interactive web portal (lungcellatlas.org).
The GAIN defines an immunomodulatory role for SMG epithelial cells, which are central to signaling circuits for local IgA responses. Specifically, we highlight that IgA plasma cells, B cells and CD4 T cells are recruited via chemokines secreted by IR-pericytes and SMG duct/ serous cells. The survival, maturation and, potentially, class switching of B lineage cells are supported by APRIL and IL-6, providing T cell independent factors. Additionally, SMG duct/serous cells have the potential to induce and/or modulate antigen-specific responses through the expression of MHC-II and CD40. Many of these pathways have been observed in other tissues, particularly in the salivary glands and within secondary lymphoid tissues such as Peyer's patches. No such secondary lymphoid structures have been observed within healthy airways. We hypothesize that GAIN is an important site for local induction of immune responses and homeostasis.
This newly defined IgA immune niche is likely to play an important role in common lung diseases as well as respiratory infections-IgA plasma cells are increased in the airways of COPD 84 and patients with cystic fibrosis 72 , and pIgR-mediated IgA transport is dysregulated in asthma 96 and pulmonary fibrosis 97 . In patients with COVID-19, early severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) neutralization was more closely correlated with IgA than IgM or IgG 98 and we report higher proportions of IgA plasma cells in the airways of patients with COVID-19. Nasal vaccines can induce a strong local sIgA response and prevent viral shedding 99 in the respiratory tract 100 . A better understanding of the GAIN is therefore highly relevant for maintaining lung health and providing immunity to respiratory infections such as COVID-19.

Online content
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41588-022-01243-4.
Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons. org/licenses/by/4.0/.
Tissue dissociation and single-cell sequencing. Tissue was collected from 13 donors from five lung locations including trachea, bronchi at the second/third generation, bronchi at the fourth generation, upper left lobe parenchyma and lower left lobe parenchyma (Fig. 1a and Supplementary Tables 1-3). In total, 11 donors were profiled for fresh and frozen transcriptomic analysis with two additional donors used for FFPE ST and smFISH/IHC validation (summarized in Supplementary Table 9). Following collection at the clinic, samples (range: 1-4 cm 3 ) were immediately placed into cold Hypothermasol FRS 32 . Within 12 h after circulation ceased, samples were dissociated (seven donors; Supplementary Tables 1 and 9) and/or preserved in optimal cutting temperature (OCT) compound and frozen in isopentane at −60 °C for later spatial analysis (six donors; Supplementary Table 3) and nuclei isolation (seven donors; Supplementary Tables 2 and 9). Most samples (n = 5) were digested using liberase and trypsin, and CD45 positive cells loaded on 10X as a separate fraction (protocols.io.39ygr7w). One donor was digested with collagenase D for comparison (protocols. io.34kgquw; Supplementary Tables 1 and 2). Briefly, tissue dissociation used (for five donors) 1 g of lung tissue washed with PBS, minced finely with scalpels, before treatment with 13 U ml −1 liberase TL and 0.1 mg ml −1 DNase I for 30 min at 37 °C with rocking. Cells were filtered through a 70 µm strainer, washed with neutralization media (RPMI + 20% FBS) and pelleted (sample P1). Tissue remaining in the cell strainer was digested with 0.25% trypsin-EDTA with DNase I for 30 min at 37 °C with rocking, filtered and washed with neutralization media. Meanwhile, sample P1 was treated with red blood cell lysis buffer before being separated into CD45 positive and negative fractions using MACS (Miltenyi, as according to the manufacturer's protocol). The CD45 negative fraction was pooled with cells from trypsin treatment, resulting in the following two samples for loading on 10X: CD45 positive cells from liberase TL digestion (to enrich for immune cells) and pooled CD45 negative liberase-treated cells with trypsin-treated cells (nonimmune fraction). Both fractions were resuspended in 0.04% BSA/PBS, counted and loaded on the 10X Genomics Chromium Controller, aiming to capture 5,000 cells, according to the manufacturer's protocol. The 10X Genomics chemistry is included in Supplementary Table 1.
Single-nucleus sequencing. Our single nuclei isolation method 101 from frozen tissue (Supplementary Table 2 RNAse inhibitors RNasin (Promega)-0.4 U µl −1 and SUPERase-In (Invitrogen) 0.2 U µl −1 ). Tissue was homogenized using ~15 strokes with pestle A (clearance 0.0028-0.0047 in.) and then pestle B (clearance 0.0008-0.0022 in.). Isolated nuclei were filtered through a 40 µM filter, collected at 2,000g and resuspended in 0.5 ml of storage buffer (PBS containing 4% BSA and RNasin (Promega)-0.2 U µl −1 ). Nuclei were incubated with NucBlue (ThermoFisher) and purified from debris by FACs sorting, stained with Trypan blue and counted. Five thousand nuclei from five different samples were pooled and all 25,000 nuclei were loaded onto the 10X chromium controller using the 3' v3.1 kit as per the Chromium Single Cell 3' Reagent Kits v3 User Guide, targeting to recover ~3,000 nuclei per sample. Post-GEM-RT cleanup, cDNA amplification and 3′ gene expression library construction were performed according to the user guide and libraries were sequenced on the Novaseq platform. Spatial transcriptomics. Samples ≤0.5 cm 2 were cut from the five lung and airway locations outlined above. Most of the parenchyma tissue was removed from bronchi samples, which were embedded in OCT and flash frozen in −60 °C isopentane (for six donors; Supplementary Table 9) or fixed for 24 h in 10% neutral buffered formalin and processed into wax (FFPE, for one donor; Supplementary Table 9). H&E staining was used to determine the morphology of tissue blocks before proceeding with ST. Sections of 10 µm (fresh frozen samples) or 5 µm (FFPE) were then cut from the blocks onto Visium slides (10X Genomics) and processed according to the manufacturer's protocol. Further details on samples are in Supplementary Table 3 and Supplementary Fig. 1. H&E images generated during the Visium protocol were captured at ×20 magnification on a Hamamatsu Nanozoomer S60.
Dual-indexed libraries were prepared as in the 10X Genomics protocol, pooled at 2.25 nM and sequenced (four samples/Illumina Novaseq SP flow cell) with read lengths 28 bp R1, 10 bp i7 index, 10 bp i5 index, 90 bp R2 for fresh frozen samples or 50 bp R2 for FFPE. Table 9). Tissue blocks for smFISH (RNAScope) in situ hybridization were chosen based on H&E staining. Ten micron-thick cryosections cut onto superfrost plus slides were processed using the RNAScope 2.5 LS multiplex fluorescent assay (ACD, Bio-Techne) on the Leica BOND RX system (Leica). Fresh frozen lung sections were fixed for 90 min with chilled 4% paraformaldehyde, washed twice with PBS and dehydrated through an ethanol series (50%, 70% and 100% ethanol) before processing according to the manufacturer's protocol with protease IV treatment. Samples were first tested with RNAScope positive and negative control probes before proceeding to run probes of interest. Slides were stained for DAPI (nuclei) and three to four probes of interest, with fluorophores Opal 520, Opal 570, Opal 650 and ATTO 425 at between 1:500 and 1:1,000 concentration. These were then imaged on a Perkin Elmer Opera Phenix high-content screening system with water immersion at ×20 magnification. Imaging data were processed using Omero (Open Microscopy Environment).

Mouse samples.
Wild-type C57/BL6 mouse samples were obtained from Kindai University, Japan (courtesy of T. Nakayama) and Charles River, USA (AMSbio). Male (colon samples) and female (all other samples) mice were maintained in specific pathogen-free conditions and used at 8-to10-week old. All animal experiments for mice obtained from Kindai University were approved by the Centre of Animal Experiments at Kindai University. Mouse tissue from Charles River was purchased from a certified animal supplier through AMSbio, with an internal ethical approval process for broadly defined research use.
Tissue preservation and antibody staining. For IBEX staining, the fresh airway tissue was received in cold Hypothermasol, fixed with 1% PFA solution for 24 h at 4 ˚C and transferred to cold 10% and 30% sucrose gradient for ~8 and ~12 h, respectively, before freezing in OCT. The fixed tissue was sectioned at 10-30 µm thickness. Iterative staining of human trachea sections was performed as described by Radtke et al. (ref. 102 ). Sections were permeabilized and blocked in 0.1 M Tris, containing 0.1% Triton (Sigma), 1% normal mouse serum, 1% normal goat serum and 1% BSA (R&D). Primary antibodies were incubated for 2 h at room temperature and secondary for 1 h at room temperature in a wet chamber, washed three times in PBS and mounted in Fluoromount-G (Southern Biotech). Images were acquired using a TCS SP8 (Leica) inverted confocal microscope. The coverslip was removed, slides were washed three times in PBS and fluorochromes were then bleached using a 1 mg ml −1 solution of lithium borohydride in water (Acros Organics) for Article https://doi.org/10.1038/s41588-022-01243-4 15 min at room temperature. Slides were washed in PBS (three times) before repeating staining, up to a total of five rounds of staining. Raw imaging data were processed using Imaris (Bitplane) using Hoechst as fiducial for the alignment of subsequent images. The staining setup and antibody information are in Supplementary Table 6.
For human airway samples and mouse trachea/colon samples, costained for CCL28 and IgA2/IgA (Supplementary Table 6), 10 µm thick OCT embedded fresh frozen sections were fixed with cold acetone (human) or room temperature acetone:ethanol (1:1) (mouse) for 20 min, followed by blocking in the buffer above (human) or 2% BSA/ PBS with 1:800 rat serum (Abcam) (mouse) for 1 h at room temperature. Primary antibodies were incubated in blocking buffer for 1 h at room temperature and washed three times in PBS. Secondary antibodies were incubated for 1 h at room temperature followed by another three times PBS washes and 5 µg ml −1 DAPI (Invitrogen) for 5 min before coverslipping with ProLong Gold Antifade Mountant (LifeTechnologies). Slides were imaged using Hamamatsu S60 slide scanner at ×40 magnification. Imaging data were processed using Omero (Open Microscopy Environment). scRNA-seq and snRNA-seq analysis. The CellRanger unfiltered matrices were used as an input for the SoupX v1.0.0 algorithm 103 to remove ambient RNA contamination, according to the tutorial (https://github. com/constantAmateur/SoupX). For each snRNA-seq library, CellRanger filtered nuclei subjected to QC filters, pass QC nuclei were processed using standard scanpy pipeline and were clustered to form five to ten clusters. Nuclei that did not pass QC were assigned to those clusters by logistic regression. This clustering was then passed to SoupX, to derive a set of cluster-specific genes for automatic estimation of contamination rate. Default values were used for SoupX's functions, except for 'autoEstCont()' where 'soupQuantile' was set to 0.8. The single-cell and nuclei libraries with SoupX correction were analyzed using the standard scanpy 1.7.1 workflow 104 . The cells with >4,000 counts in nuclei and 20,000 counts in the cells were removed. In the cells, droplets with >10,000 features were removed. Lower threshold of 1,000 features was applied to donor A37 due to difficulty to remove ambient RNA contamination. Master cell types were annotated and extracted for reanalysis with scanpy workflow, including new highly variable genes (HVGs) detection. Between 1,000 and 3,000 HVGs were used to define 40 principal components for calculating the UMAP. Data integration via Harmony v1.0 (ref. 105 ), BBKNN v1.4.1 (ref. 106 ) or scVI-tools v0.9.0 (ref. 107 ) was used with either 'material' (cells versus nuclei), or 'material and donor'. Doublet clusters, identified by observing markers from multiple cell types and higher counts in snRNA-seq, were removed. An iterative clustering approach was used to derive clusters for less abundant cell types. In addition to known marker genes, new ones were derived using the scanpy rank_genes_groups function. Cell types and master clusters were annotated according to known and newly derived markers as in Supplementary Note 1 and in consensus with other studies in Supplementary Table 4. The UMAP of the airway epithelial cell types was achieved by integrating the data with published human airway epithelial cells 9 , including previously unannotated serous and mucous cells identified from the unprocessed data of their study. Altogether we detected all cell types from at least two different locations and five different donors. The distribution of cell types with respect to location, material and donor variables are shown in Supplementary Table 7 and visualized by the contribution of donor in Supplementary Fig. 2.

Spatial mapping of cell types using Visium ST and cell2location.
Visium ST data was analyzed by integrating scRNA-seq/snRNA-seq and spatial transcriptomes with the cell2location method (v0.1) 13 . Cell2location estimates reference gene expression signatures of cell types from scRNA-seq using Negative Binomial regression that accounts for batch effects. In Extended Data Fig. 3b, we extended the cell type reference to germinal center cell types from a published human gut dataset 18 . Cell2location uses the reference signatures to estimate absolute spatial abundance of cell types, integrating and normalizing data across 11 fresh frozen Visium sections (five were excluded based on quality control metrics). 10X Visium data were processed to untransformed and unnormalized mRNA counts, filtered to genes shared with scRNA-seq, with hyperparameters in cell2location based on the tissue and experiment quality as follows: (1) Expected cell abundance per location N = 20 (2) Regularization of within-experiment variation in RNA detection sensitivity of α y = 20 The model was trained until convergence (40,000 iterations). Loss function (ELBO) scaling by locations × genes was used. Pearson correlation between log 10 (x + 1)-transformed observed mRNA counts and expected mRNA amount from the cell2location model assessed the model quality (Pearson R = 0.745).
Micro-anatomical tissue environments were labeled from the H&E images. Only Visium spots aligned and annotated as 'tissue' were used for analysis (manual annotation). Specific tissue environments are listed in Supplementary Table 5 and at lungcellatlas.org loupe browser. Visium FFPE (four sections) allowed better conservation of morphology and therefore had more detailed manual annotations including the separation of mucous glands from the seromucous/ other glands. The annotation of mucous-only glands was conservative to distinctly separate mucous-only glands, as the transcripts from high-count serous cells contaminated neighboring spots. The manual annotations were used to compute cell abundance of each cell type across micro-environments.
We further used NMF of cell abundance estimated by cell2location for unbiased microenvironment identification. Scikit-learn NMF implementation from cell2location package was used. NMF was trained with a range of factor numbers (8)(9)(10)(11)(12)(13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24). NMF factor loadings for cell types are reported in the paper as dot plot normalized per cell type by the sum of NMF loadings, which can be interpreted as a proportion of cells of each cell type present in each tissue zone. NMF factor loadings across locations are reported as the total cell abundance of constituent cell types.
The code used for marker gene dot plots with mean group expressions and expression of TCR regions was previously published with code available at https://doi.org/10.5281/zenodo.3711134.
Pseudotime analysis for selected cell populations was performed with Monocle3 (ref. 109 ), its functionality to infer a pseudotime based on UMAP coordinates. The root was identified as the cell with the highest combined expression of canonical progenitor markers (VCAN for chondrocytes; TGM2, HMCN2 and SULF1 for smooth muscle).
BCR and TCR analysis from VDJ-data. VDJ analysis was done with Scirpy 0.6.0 (https://icbi-lab.github.io/scirpy/) 111 . For TCR data, clonotypes were defined based on CDR3 nucleotide sequence identity. For BCR data, clonotypes were defined based on the Hamming distance between CDR3 amino acid sequences with a cutoff of two and orphan VJ chains removed. In both cases, V gene identity was required and the CDR3 sequence similarity was evaluated across all of a cell pair's V(D)J chains.
Statistics and reproducibility. Spearman rank correlation test (two tailed) was performed for zonation analysis of the SMG serous cells across trachea, bronchi 2-3 and bronchi 4 locations for the three donors (A37, A41 and A42) with at least 20 cells in at least two locations. Correlation coefficients and P values were calculated per every donor separately. A Poisson linear mixed model was used for cell type composition analysis. Poisson regression with various metadata as covariates was applied to adjust confounding effects on the cell type count data as previously described 112,113 . We used location as a biological factor, and protocol, material (scRNA-seq versus snRNA-seq) and donor as technical factors in the model as random effects to overcome the collinearity (see Supplementary Notes in ref. 113 for more details).
Gene set enrichment analysis was done with g:GOSt method and g:SCS threshold with flat list in the gProfiler webpage https://biit.cs.ut. ee/gprofiler/gost. GOSt uses Fisher's one-tailed test, also known as cumulative hypergeometric probability.
Donors used for smFISH, IHC and Visium experiments are shown in Supplementary Table 9, smFISH and Visium experiments were performed once for each donor often across multiple sections, IHC staining was repeated at least twice depending on the markers used. Full staining figures, reproducibility and antibodies for protein staining from HPA can be queried online at https://www.proteinatlas.org/.
Variance in gene expression. To determine the effects of the metadata features on the expression data, a linear mixed model was used 39 . Genes expressed in less than 5% of the samples were filtered out. The count matrix was then normalized and log transformed. The percentages of variance in gene expression data explained by each metadata feature were obtained by fitting the linear mixed model. The Bayes factor was then computed to determine the gene-specific effects of some metadata features in the expression data, assigning an effect size and a local true sign rate (LTSR) for all genes analyzed. Genes presenting an LTSR value greater than 0.9 were considered substantially affected by the metadata feature analyzed. See Supplementary Notes in ref. 39 for more details. fGWAS analysis. The fGWAS approach to determine disease-relevant cell types is described elsewhere 18 . Summary statistics for the selected GWAS study of Lung function (FEV1/FVC) 114 were obtained via Open Targets Genetics (https://genetics.opentargets.io/study/GCST007431; https://www.ebi.ac.uk/gwas/studies/GCST007431). The code used for fGWAS plots and for cell type proportion analysis is available here: https://github.com/natsuhiko/PHM. HLCA, COPD and IPF data analysis. To assess PB-fibros in lung disease, we used both the HLCA extended 10 and Adams et al. 8 datasets. To look for differentially expressed genes between PB-fibros in COPD, fibroblast labels were transferred from the HLCA (which annotates PB-fibro based on our work) to Adams et al. 8 dataset. Differential genes were identified by Wilcoxon rank-sum test (P < 0.05). PB-fibros were rare, 74 in COPD and 20 in controls, but represented across 21 individuals (12 patients with COPD and 9 healthy controls). Genes displayed are a portion of the top-upregulated genes, which are implicated in Lung function (FEV1/FVC). The abundance of PB-fibros in disease was assessed in the HLCA and Adams et al. 8 datasets using PLMM as described above and MiloPy, which tests differential abundances on the KNN graph 115 . For PLMM analysis, covariates from the HLCA accounted for were 'sample', 'study', 'subject ID', 'sex', 'ethnicity', 'smoking status', 'condition' (disease, manually harmonized), 'subject type', 'sample type', 'sequencing platform', 'cells or nuclei' and 'anatomical region detailed unharmonized'. MiloPy analysis was performed using the standard workflow, the KNN graph was generated from the latent space already available in the extended HLCA (X_scanvi_emb) and 'study' was used as a covariate.

COVID-19 data analysis.
To assess the impact of infection on IgA abundance, we used a dataset 71 , with tracheal, bronchial and nasal epithelium brushings from children and adults. Although these samples differed from the deep airway biopsies taken in this study, they still contained some SMG epithelial cells and IgA plasma cells, showing a level of compatibility with the healthy samples used in our study. A total of 470 plasma cells from five donors (three healthy and two COVID + ), each with at least 20 plasma cells, were pooled, clustered and classified into IgA, IgD, IgH and IgM isotypes based on expression levels of IGHA1, IGHA2, IGHD, IGHG1, IGHG2, IGHG3, IGHG4 and IGHM. Proportions were calculated across cells for healthy and COVID + donors separately.

Cell-cell interaction analysis.
Cell-cell interaction from scRNA-seq data was predicted using CellChat 77 . B plasma subsets were combined (B-plasma) and cell types of interest (B-naive, B-memory, SMG duct, SMG mucous, SMG serous, CD4-naive/CD4-CM, CD4-EM/CD4-effector and CD4-TRM) were downsampled to 200 cells per cell type per donor. Analyses were performed both with individual CD4 T cell subsets and all CD4 subsets combined. Normalized count matrix along with cell annotation metadata was processed through the standard CellChat pipeline, except that the communication probability was calculated with a truncated mean of 10%.

Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability
All transcriptomic datasets generated as part of the study are publicly available. The processed scRNA-seq, snRNA-seq and Visium ST data are available for browsing and download via our website www.lungcellatlas. org. The dataset (raw data and metadata) is available on the Human Cell Atlas Data Portal and on the European Nucleotide Archive (ENA) under accession number PRJEB52292 and BioStudies accession S-SUBS17. The Visium data are publicly available on ArrayExpress with the accession number E-MTAB-11640. Imaging data can be downloaded from European Bioinformatics Institute (EBI) BioImage Archive under accession number S-BIAD570. Additional data were accessed to support analysis and conclusions, which can be accessed through National Centre for Biotechnology Information Gene Expression Omnibus GSE136831, and GSE134174 and the HLCA integration, which can be accessed through github https://github.com/LungCellAtlas/HLCA.