Although single-cell RNA sequencing studies have begun to provide compendia of cell expression profiles1,2,3,4,5,6,7,8,9, it has been difficult to systematically identify and localize all molecular cell types in individual organs to create a full molecular cell atlas. Here, using droplet- and plate-based single-cell RNA sequencing of approximately 75,000 human cells across all lung tissue compartments and circulating blood, combined with a multi-pronged cell annotation approach, we create an extensive cell atlas of the human lung. We define the gene expression profiles and anatomical locations of 58 cell populations in the human lung, including 41 out of 45 previously known cell types and 14 previously unknown ones. This comprehensive molecular atlas identifies the biochemical functions of lung cells and the transcription factors and markers for making and monitoring them; defines the cell targets of circulating hormones and predicts local signalling interactions and immune cell homing; and identifies cell types that are directly affected by lung disease genes and respiratory viruses. By comparing human and mouse data, we identified 17 molecular cell types that have been gained or lost during lung evolution and others with substantially altered expression profiles, revealing extensive plasticity of cell types and cell-type-specific gene expression during organ evolution including expression switches between cell types. This atlas provides the molecular foundation for investigating how lung cell identities, functions and interactions are achieved in development and tissue engineering and altered in disease and evolution.
Since Malpighi10, dozens of lung cell types have been discovered by microscopy11, creating histological atlases that are the cellular foundation for pulmonary medicine. More recently, cell-type-specific markers12,13 have been identified that provide molecular definitions and functions of the cell types14, reaching its apex in genome-wide expression profiling by single-cell RNA sequencing (scRNA-seq)15,16,17,18,19. We sought to create a comprehensive molecular cell atlas of adult human lung using scRNA-seq analysis, a substantial challenge because the 45 histological cell types have diverse structures, locations, and abundances that vary over five orders of magnitude (Supplementary Table 1).
Fifty-eight molecular cell types of the human lung
We acquired histologically normal lung tissue intraoperatively from bronchi (proximal), bronchiole (medial), and alveolar (distal) regions along with peripheral blood (Extended Data Fig. 1a, d). Lung samples were dissociated into cell suspensions, and each suspension was sorted into epithelial (EPCAM+), endothelial/immune (CD31+CD45+) and stromal (EPCAM−CD31−CD45−) populations (Supplementary Fig. 1a). This allowed us to balance tissue compartment representation for sequencing. We also sorted blood cells to balance immune lineages (Supplementary Fig. 1b). Sequencing libraries were prepared using 10x Chromium (10x) or SmartSeq2 (SS2)20. Higher throughput of 10x enabled discovery of rare cell types, whereas SS2 gave deeper transcriptomic information; there were also platform-specific idiosyncrasies in cell capture. We sequenced thousands of cells from each compartment for each subject (Supplementary Table 2) to directly compare cell types without batch correction, and did so for three subjects to address individual differences. High-quality transcriptomes were obtained from approximately 75,000 cells (65,662 10x; 9,404 SS2).
We grouped cells based on the expression of compartment-specific markers (Extended Data Fig. 1b), then iteratively clustered21 them for each subject to identify transcriptionally distinct cell populations. Populations between subjects were merged using cluster-specific marker genes for downstream analyses. Our approach identified 58 transcriptionally distinct cell populations (mean 51 per subject) (Extended Data Fig. 1c, Supplementary Table 2), 37 more than a recent state-of-the-art study19.
Transcriptomes of canonical cell types
The 58 molecular types included 15 epithelial, 9 endothelial, 9 stromal and 25 immune populations, greater than the number of classical cell types in each compartment (Supplementary Table 2). Using extant and newly identified (bronchial vessel) markers (Supplementary Table 1) and single-molecule fluorescence in situ hybridization (smFISH), we found clusters that represent all but one classical lung cell type in epithelial, endothelial and stromal compartments (Fig. 1a, b).
Immune cells were the most heterogeneous and included circulating, egressed and lung-resident cells. To aid identity assignment, we defined transcriptional profiles of circulating immune cells by bulk RNA sequencing of 21 sorted, functionally characterized classes of human blood cells (Extended Data Fig. 2a, Supplementary Table 3). We also obtained scRNA-seq profiles of around 5,000 blood cells from two patients whose lung cells we analysed. Canonical immune markers and the ascertained panels of differentially expressed genes were used to assign the identities to 25 immune clusters from our lung and blood scRNA-seq analysis, including all but one previously known lung immune cell type (Fig. 2a, Extended Data Fig. 2b).
Our approach defined genome-wide expression profiles for nearly all classical lung cell types (41 out of 45, 91%), from the most abundant (capillaries, approximately 23% of lung cells) to exceedingly rare (ionocytes, 0.01%) (Supplementary Table 1). One-quarter (11 out of 45) previously lacked high-quality single-cell transcriptomes. The only classical types not captured are extremely rare (neurons, glia), primarily found in disease (tuft cells)22, or require special isolation methods (eosinophils).
New lung cell types, subtypes and states
Many canonical types were represented by more than one cluster, so the specific identities of 25 clusters remained uncertain. All but one were found in samples from several participants so were unlikely to be subject-specific (Supplementary Table 2). This suggested that the distinct expression profiles uncovered represented discrete molecular states or novel cell types or subtypes. To distinguish these possibilities, we analysed the differentially expressed genes and examined cell structure and location.
We first identified clusters representing common cell states. Three clusters (proliferating basal cells, proliferating natural killer/T cells, and proliferating macrophages) were enriched in the expression of cell cycle genes, which indicates that they represent the proliferative states of basal cells, natural killer cells, T cells and macrophages, respectively, and are the most proliferative lung cell types (Extended Data Fig. 3a). Another cluster (differentiating basal cells) had reduced expression of KRT5 and increased expression of HES1, KRT7 and SCGB3A2, indicating active differentiation to other epithelial fates23,24, consistent with their transitional morphology (Extended Data Fig. b, c). Proliferating and differentiating basal cells derived mostly from proximal lung samples (Extended Data Fig. 3d, e), suggesting one-third of proximal basal cells are active.
The other basal cell clusters were quiescent and localized to proximal (large, pseudostratified) airways, or both proximal and distal (small, simple) airways (Extended Data Fig. 3e, f). The basal cell clusters are distinguished by hundreds of genes, which suggests that they are molecularly distinct cell types that differ in hormone production (ALOX15, ADH7, SNCA) and adhesion (POSTN, ISLR, PCDH7) (Extended Data Fig. 3b). There were also distinct clusters of ciliated cells along the proximal–distal axis (Extended Data Fig. 3g, h).
We uncovered two clusters of alveolar type 2 (AT2) cells (Fig. 1c), which produce surfactant that prevents alveolar collapse. These are intermingled throughout the alveolar epithelium (Fig. 1d). One cluster (WIF1+HHIP+CA2+) expressed higher levels of some canonical AT2 markers (SFTPA1, SFTPC and ETV5) and selectively expressed inhibitors of Wnt (WIF1) and Hedgehog (HHIP) signalling and the cell cycle (CDKN1A), indicating that they are quiescent (Extended Data Fig. 3i, left). The other, tenfold less-abundant cluster (AT2-signalling cells) selectively expressed genes involved in Wnt signalling (WNT5A, LRP5, CTNNBIP and TCF7L2 (also known as TCF4)) and detoxification (CP, GSTA1 and CYP4B1) (Extended Data Fig. 3i, right). AT2-signalling cells could be alveolar stem cells, homologous to the rare, Wnt-active subpopulation of mouse AT2 cells (AT2stem)25,26. However, homology between human AT2-signalling and mouse AT2stem cells is provisional, because although both show increased Wnt signalling or components, the many other expression differences between human AT2-signalling and ‘bulk’ AT2 cells are not shared by mouse AT2stem cells.
We found unexpected molecular diversity in the endothelial compartment (Extended Data Fig. 3j). Two populations were identified as bronchial by their localization around bronchi (Extended Data Fig. 3k). Thus, bronchial endothelial cells are distinct from their counterparts in the pulmonary circulation, distinguished by matrix (VWA1 and HSPG2), fenestrated morphology27 (PLVAP) and cell cycle-associated (MYC and HBEGF) genes. Four clusters of endothelial cells in the pulmonary circulation expressed capillary markers. Two (capillary aerocytes and general capillary cells) are intermingled alveolar capillary cell types28; the others are rare capillary types showing features of both (capillary ‘intermediates’ 1 and 2).
We identified new types in the stroma, the least characterized compartment. Two clusters expressed classical fibroblast markers (BSG and COL1A2) (Fig. 1e) but one (SPINT2+FGFR4+GPC3+) localized to alveoli (‘alveolar fibroblasts’) and the other (SFRP2+PI16+SERPINF1+) to vascular adventitia and nearby airways (‘adventitial fibroblasts’) (Fig. 1f, Extended Data Fig. 4a–d). Both expressed genes involved in canonical fibroblast functions (matrix biosynthesis, adhesion and signalling regulators) but the specific genes often differed (Extended Data Fig. 4e). Each cluster also has distinct functions: expression of voltage-gated sodium channel SCN7A and glutamate receptor GRIA1 suggest alveolar fibroblasts are excitable cells with glutamatergic input (Supplementary Table 4). Their profiles also suggest novel, shared functions including the recruitment of immune cells (IL1RL1, IL32, CXCL2 and genes in the class II major histocompatibility complex locus) and the complement system (C2, C3, C7, CFI, CFD, CFH and CFB).
Two stromal clusters were enriched for ACTA2, a canonical marker of myofibroblasts (Fig. 1e), which help form and stabilize alveoli. One cluster (WIF1+FGF18+ASPN+) is classical myofibroblasts and localized to alveolar ducts (Extended Data Fig. 4f). The other (‘fibromyocytes’) showed higher expression of contractile genes (MYH11, CNN1 and TAGLN), was preferentially isolated from samples of proximal lungs, and was found both intermingled with airway smooth muscle and in alveoli (Extended Data Figs. 3e, 4g). Both populations shared expression of genes for canonical fibroblast functions, although the specific genes differed from alveolar and adventitial fibroblasts (Supplementary Table 4).
Lung immune cell residency signatures
To distinguish between lung-resident, egressed and circulating immune cells, we compared the relative abundance of each immune population in lung and peripheral blood samples from the same subject (Fig. 2a). Eleven clusters (including alveolar macrophages, as expected29) consisted of cells only from lung samples, with no or rare exception, which indicates that they are lung-resident or greatly enriched. This included three novel lung dendritic populations: IGSF21+ and rare EREG+ dendritic cells express asthma genes (CCL2, CCL13 and IGSF21) and developmental signals (EREG, VEGFA, AREG), respectively, and both localize to proximal vessels; TREM2+ dendritic cells localize to vessels and alveoli and express lipid machinery (APOC1, APOE and CYP27A1) (Fig. 2b, Extended Data Fig. 4k–n).
The other immune cell types were found in both lung and blood samples. For some types, every cell—whether from lung or blood—clustered together. However, for other types, cells from lung formed a separate cluster (Extended Data Fig. 4o). Some of the differentially expressed genes may be due to technical differences (for example, collagenase treatment of lung30, circulating RNA in blood31), but others such as upregulation in lung cells of lymphocyte-residence gene CD69 probably represent genes induced after egression32. We identified a core transcriptional signature for all human lung-resident lymphocytes (Fig. 2c), which overlaps a residence signature found by bulk RNA sequencing of CD8+ T cells in mouse spleen, gut and liver33. We also found a residency signature for lung myeloid cells that overlaps the lymphocyte signature, supporting a core residency program for immune cells plus specific subprograms for myeloid cells and lymphocytes.
Cell markers, regulators and interactions
We identified optimal markers for each previously known and newly identified lung cell type (Extended Data Fig. 5a, Supplementary Table 4). Approximately 200 markers can distinguish virtually all types (Extended Data Fig. 5b), so could be used with multiplexed smFISH34,35,36 to simultaneously detect in clinical specimens alterations in their numbers and relationships. A similar compendium of membrane protein markers (Supplementary Table 4) could be used to purify or therapeutically target specific lung cell types. We also identified around 400 cell type-selective transcription factors (Extended Data Fig. 5e, Supplementary Table 4), putative ‘master regulators’ that could help create all lung cell types by cellular reprogramming. These include what may be long-sought master regulators of AT1 cells (for example, MYRF), which comprise the gas-exchange surface, and of pericytes (TBX5) (Extended Data Fig. 5c, d).
The atlas allowed us to map the cell targets of circulating hormones, based on expression of their cognate receptors. Receptors for some hormones are broadly expressed, indicating direct action throughout the lung (Extended Data Fig. 6a). Other hormones have specific and unexpected targets, such as somatostatin (SSTR1, arteries), melanocortin (MC1R, ionocytes), and oxytocin (OXTR, ciliated cells). Pericytes are predicted targets of several hormones, which could affect their contractile machinery to regulate alveolar perfusion (Extended Data Fig. 6b). Receptors for half the hormones were not detectably expressed so these hormones may not directly influence lung physiology. We also mapped local signalling interactions by examining expression of ligands and receptors, which predicts up to hundreds of interactions among neighbouring cell types (Extended Data Fig. 6c, Supplementary Table 5).
The expression of chemokine receptors illuminated immune cell homing (Fig. 3). Our data confirmed canonical homing interactions such as CD4+ T cells to lymphatic vessels, and provides specificity for others such as plasma cell homing to epithelial mucosa through CCL28 from serous cells. It also predicts new interactions such as CX3CR1-mediated homing of nonclassical monocytes to CX3CL1-expressing endothelial and airway epithelial cells. All three new dendritic populations express CCR1, which could mediate their attraction to veins (CCL23), bronchial vessels (CCL14), ciliated cells (CCL15), and lymphocytes (CCL5). Ionocytes are the only non-immune cell to express appreciable levels of any chemokine receptor (CXCR4).
Mapping cellular focus of lung diseases
We determined the expression of 233 extant lung disease genes (Extended Data Fig. 7). Disease genes with cell-type-specific expression (Extended Data Fig. 8a) and cell types expressing many genes associated with a specific disease (Extended Data Fig. 8b) are of special interest because they can pinpoint the cellular origin of disease. This supported known or suspected ‘culprit’ cells for 27 genes involved in 12 diseases, and identified potential culprits for 21 genes implicated in 15 diseases including pericytes in pulmonary hypertension, capillaries in atrioventricular dysplasia, and AT2 cells in chronic obstructive pulmonary disease (COPD). We confirmed pericyte, capillary and AT2 expression of disease genes by smFISH (Extended Data Fig. 8c–e).
We mapped expression of 80 genes encoding virus receptors, including 26 used by respiratory viruses (Extended Data Figs. 9a, 10). NECTIN4 (measles virus receptor) was enriched in club, ciliated, differentiating basal, and goblet cells, and CDHR3 (‘common cold’ rhinovirus C) was enriched in ciliated and neuroendocrine cells, indicating that infections initiate in those bronchial types. By contrast, ACE2 (SARS, COVID-19 coronaviruses) and DPP4 (MERS coronavirus) were both detected in AT2 cells (Extended Data Fig. 9b), consistent with severe alveolar pathology37.
Evolution of cell types and expression
Construction of a mouse lung atlas2 plus additional cells annotated as above for human (Supplementary Table 6) allowed analysis of evolutionary conservation of lung cell types and their transcriptomes. Homologous cell types were assigned by conserved expression of cell-type markers (Fig. 4a). Notably, mice seem to lack 17 (29%) of the 58 human lung cell types including 12 of the 14 (86%) newly identified types. Some missing mouse populations might be rare, transient, unstable, or too diverged to relate transcriptionally so may be uncovered by further studies. By contrast, just five mouse cell populations, all immune, were not found in human. This suggests substantial diversification of lung cell types during mammalian evolution.
We compared expression levels of all active genes in each human cell type with those of the orthologous genes in the corresponding mouse type (Extended Data Fig. 11a, Supplementary Table 7). Most cell types correlated best with their counterparts across species, but surprisingly one human type (goblet) showed greater correlation with another mouse type (club, R = 0.68 versus 0.63) (Extended Data Fig. 11b)—despite conserved expression of canonical markers and master regulator SPDEF (Extended Data Fig. 11c). Corresponding cell types in human and mouse diverged in expression (a greater than 20-fold change, P < 0.05) of hundreds of genes, such as SERPINA1, PGC, WIF1 and LYZ in AT2 cells (Fig. 4b). Lung as a whole had fewer diverged genes than any cell type, which suggests that expression lost in one type is gained in another (Extended Data Fig. 11d). Diverged genes varied above age-related expression changes in mice (Extended Data Fig. 11e) and included canonical cell-type markers, transcription factors, signalling molecules and disease genes.
Evolutionary changes in expression grouped into four types (Supplementary Table 7). Type 0 (‘conserved’) genes are expressed in the same cell types in mouse and human (Fig. 4e, Extended Data Fig. 12a). Type 1 (‘expression gain/loss’) genes show simple gain (or loss) of expression between species, which involved a single cell type (type 1a, PGC) (Fig. 4e), several types (type 1b, RNASE1) (Extended Data Fig. 12b), or entire lung (type 1c, TRIM38) (Extended Data Fig. 12b). Type 2 (‘expression expansion/contraction’) changes involved gain (or loss) of expression in additional lung cell types, expanding (or contracting) expression of the gene during evolution. For example, HOPX, the canonical AT1 transcription factor in mouse, is expressed in both AT1 and AT2 cells in human (Fig. 4c, e), which indicates the existence of other AT1 transcription factors such as MYRF, which is AT1-selective in both species (Extended Data Fig. 12c). Expanded expression of RAMP3, co-receptor for vasodilators CGRP and adrenomedullin, presumably alters pulmonary vascular response to these hormones (Extended Data Fig. 12d).
Type 3 (‘expression switch’) changes involve a switch in expression from one cell type to another. Two medically important examples are COPD/emphysema genes SERPINA1 and HHIP, both selectively expressed in AT2 cells in human but alveolar stromal cells in mice (Fig. 4d, e, Extended Data Fig. 12e); other hedgehog pathway components were mostly conserved (Extended Data Fig. 12f). Extreme examples occurred during evolution of species-specific cell types, such as consolidation in the expression of anti-bacterial enzymes (LTF, LYZ and BPIFB1) from several mouse airway cells into human-specific serous cells, and consolidation of broadly expressed lipid-handling genes (PLIN2 and APOE) from mouse alveolar fibroblasts (which can contain lipid droplets) and myofibroblasts to human-specific lipofibroblasts (Extended Data Fig. 12g).
Despite general conservation of cell type expression patterns noted above, only 6% of expressed genes showed fully conserved patterns (type 0), most extremely specific or broadly expressed (Extended Data Fig. 12h, Supplementary Table 8). Thus, expression patterns of nearly all genes are evolutionarily labile, most undergoing broadening (55%, type 2) or simple gain or loss (29%, type 1) and rarely cell type switching (10%, type 3) (Supplementary Table 9).
We constructed a comprehensive expression atlas of human lung comprising 58 molecular types and their locations (Fig. 1b) including 41 out of 45 previously known cell types, all but the exceedingly rare. We identified 14 novel populations across all four compartments that are as distinct molecularly as the canonical cell types; each must be thoroughly characterized, as done for new capillary types28. If there are other lung cell types, they must be exceedingly rare, fragile, region- or stage-specific, or so similar to the 58 that they are not resolved by current methods.
The atlas has broad implications for physiology and medicine, providing insight into the functions, regulation and interactions of the known and new cell types. It identifies those directly affected by hormones, viruses and extant lung disease genes, and distinguishes lung-resident and homing immune cell types and infers their expression changes after egression from circulation and the cellular sources of homing signals. The atlas defines type-selective transcription factors for creating cells to engineer a lung, and provides optimal markers and a benchmark for monitoring all types and how they change during development, ageing, disease and evolution.
Mice appear to lack 17 out of the 58 human molecular lung cell types, including most (12 out of 14) of the newly discovered ones. This suggests a considerable expansion of cell types in the human lineage, perhaps for new functions, durability, or regenerative capacity of our 6,000-fold larger lungs and 30-times longer lifespan38,39. Even homologous cell types diverged in expression of hundreds of genes. Indeed, just 6% of expressed genes had fully conserved expression patterns across the lung, indicating widespread gain, loss or conversion of cell-type-specific transcriptional enhancers during mammalian evolution. It will be important to unravel the genetic mechanisms and functional consequences of these changes, and to determine the selective forces operative for genes with fully conserved expression. The evolutionary cell type and expression changes predict where mouse will fail to model human lung physiology and disease.
The success of our atlas relied on: procuring fresh tissue across the organ plus matched blood; balancing tissue compartments to ensure broad cell representation; extensive profiling of each subject using broad cell capture and deep gene coverage scRNA-seq strategies; clustering subject and compartment data separately and iteratively; assigning cell identies using extant markers, functions of selectively expressed genes, and tissue localization. Applying the approach to other organs could create a comprehensive human molecular cell atlas.
Human lung tissue and peripheral blood
Freshly resected lung tissue was procured intraoperatively from patients undergoing lobectomy for focal lung tumours. Normal lung tissues (approximately 5 cm3) were obtained from uninvolved regions and annotated for the specific lung lobe and location along the airway or periphery. Pathological evaluation (by G.B.) confirmed normal histology of the profiled regions, except for areas of very mild emphysema in patient 1. Patient 1 was a 75-year-old male with a remote history of smoking, diagnosed with early stage adenocarcinoma who underwent left upper lobe (LUL) lobectomy; two blocks of normal tissue were obtained from lung periphery (distal 1a and 1b). Patient 2 was a 46-year-old male, non-smoker with a right middle lobe (RML) endobronchial carcinoid, who underwent surgical resection of the right upper and middle lobes; two blocks of tissue were selected from mid-bronchial region (medial 2) and periphery (distal 2) of right upper lobe (RUL). Patient 3 was a 51-year-old female, non-smoker with mild adult-onset asthma and a left lower lobe (LLL) endobronchial typical carcinoid, who underwent LLL lobectomy; three tissue blocks were resected from the bronchus (proximal 3), mid-bronchial (medial 2), and periphery (distal 3) of the LLL. All tissues were received and immediately placed in cold PBS and transported on ice directly to the research lab for single cell dissociation procedures. Peripheral blood was collected from patients 1 and 3 in EDTA tubes. For bulk RNA-seq of canonical immune populations, whole blood from healthy human donors was obtained commericially (AllCells) in EDTA tubes. Patient tissues were obtained under a protocol approved by Stanford University’s Human Subjects Research Compliance Office (IRB 15166) and informed consent was obtained from each patient before surgery. All experiments followed applicable regulations and guidelines.
Mouse lung tissue
Lung tissue for Tabula Muris Senis40 was obtained as previously described. We obtained additional tissue from two mice expressing Cre recombinase and two expressing oestrogen-inducible Cre recombinase (Cre-ERT2) for conditional cell-specific labelling in vivo with the gene-targeted alleles FVB-Tbx4-LME-cre41,42 (lung stroma) and B6.129-Axin2-cre-ERT241, respectively. Cre-dependent reporter alleles Rosa26ZsGreen1, which expresses cytosolic ZsGreen1 following Cre-mediated recombination, and Rosa26mTmG, which expresses membrane-targeted green fluorescent protein (mGFP) after recombination and membrane-targeted tdTomato (mTomato) in all other tissues, were used to label cells expressing Tbx4 and Axin2, respectively43,44. Induction of the Axin2-cre-ERT2 allele was done by intraperitoneal injection of tamoxifen (3 mg) once a day for three days as described25. All mouse experiments followed applicable regulations and guidelines and were approved by the Institutional Animal Care and Use Committee at Stanford University (Protocol 9780).
Isolation of lung and blood cells
Individual human lung samples were dissected, minced, and placed in digestion media (400 μg ml−1 liberase DL (Sigma 5466202001) and 100 μg ml−1 elastase (Worthington LS006365) in RPMI (Gibco 72400120) in a gentleMACS c-tube (Miltenyi 130-096-334). Samples were partially dissociated by running ‘m_lung_01’ on a gentleMACS Dissociator (Miltenyi 130-093-235), incubated on a Nutator at 37 °C for 30 min, and then dispersed to a single cell suspension by running ‘m_lung_02’. Processing buffer (5% fetal bovine serum in PBS) and DNase I (100 μg ml−1, Worthington LS006344) were then added and the samples rocked at 37 °C for 5 min. Samples were then placed at 4 °C for the remainder of the protocol. Cells were filtered through a 100-μm filter, pelleted (300g, 5 min, 4 °C), and resuspended in ACK red blood cell lysis buffer (Gibco A1049201) for 3 min, after which the buffer was inactivated by adding excess processing buffer. Cells were then filtered through a 70-μm strainer (Fisherbrand 22363548), pelleted again (300g, 5 min, 4 °C), and resuspended in magnetic activated cell sorting (MACS) buffer (0.5% BSA, 2 mM EDTA in PBS) with Human FcR Blocking Reagent (Miltenyi 130-059-901) to block non-specific binding of antibodies (see below).
Immune cells, including granulocytes, were isolated from peripheral blood using a high density ficoll gradient45. In brief, peripheral blood was diluted tenfold with FACS buffer (2% FBS in PBS), carefully layered on an RT Ficoll gradient (Sigma HISTOPAQUE-1119), and centrifuged at 400g for 30 min at room temperature. The buffy coat was carefully removed, diluted fivefold with FACS buffer, pelleted (300g, 5 min, 4 °C), and incubated in ice cold FACS buffer containing DNase I (Worthington LS006344) for 10 min at 4 °C. Clumps were separated by gentle pipetting to create a single-cell suspension.
Mouse lung samples were processed into single cell suspensions as previously described2. In brief, each lung was dissected, minced, and placed in gentleMACS c-tubes (Miltenyi 130-096-334) with digestion buffer (400 μg ml−1 liberase DL (Sigma 5466202001) in RPMI (Gibco 72400120)). The minced tissue was partially dissociated by running ‘m_lung_01’ on a gentleMACS Dissociator (Miltenyi 130-093-235), incubated at 37 °C on a nutator for 30 min, completely dissociated on a gentleMACS by running ‘m_lung_02’, and kept at 4 °C or on ice for the remainder of the protocol. Cells were washed with 5% FBS in PBS, centrifuged at 300g for 5 min, resuspended in 5% FBS in PBS, filtered through a 70-μm strainer (Fisherbrand 22363548), and centrifuged again and resuspended in FACS buffer (2% FBS in PBS).
Magnetic separation of lung tissue compartments
Immune and endothelial cells were overrepresented in our previous mouse single-cell suspensions. To partially deplete these populations in our human samples, we stained cells isolated from lung with MACS microbeads conjugated to CD31 and CD45 (Miltenyi 130-045-801, 130-091-935) then passed them through an LS MACS column (Miltenyi, 130-042-401) on a MidiMACS Separator magnet (Miltenyi, 130-042-302). Cells retained on the column were designated ‘immune and endothelial enriched’. The flowthrough cells were then split, with 80% immunostained for FACS (see below) and the remaining 20% stained with EPCAM microbeads (Miltenyi 130-061-101). EPCAM stained cells were passed through another LS column. Cells retained on the column were labelled ‘epithelial enriched’, and cells that flowed through were designated ‘stromal’.
Flow cytometry and cell sorting
Lysis plates for single-cell mRNA sequencing were prepared as previous described2. 96-well lysis plates were used for cells from the blood and mouse samples and contained 4 μl of lysis buffer instead of 0.4 μl.
After negative selection against immune and endothelial cells by MACS, the remaining human lung cells were incubated with FcR Block (Becton Dickinson 564219) for 5 min and stained with directly conjugated anti-human CD45 (Biolegend 304006) and EPCAM (eBioscience 25-9326-42) antibodies on a Nutator for 30 min at the manufacturer’s recommended concentration. Cells were then pelleted (300g, 5 min, 4 °C), washed with FACS buffer three times, then incubated with cell viability marker Sytox blue (1:3,000, ThermoFisher S34857) and loaded onto a Sony SH800S cell sorter. Living single cells (Sytox blue-negative) were sorted into lysis plates based on three gates: EPCAM+CD45− (designated epithelial), EPCAM−CD45+ (designated immune), and EPCAM−CD45- (designated endothelial or stromal).
Immune cells from subject matched blood were incubated with FcR Block and Brilliant Violet buffer (BD 563794) for 20 min and then stained with directly conjugated anti-human CD3 (BD 563548), CD4 (BD 340443), CD8 (BD 340692), CD14 (BD 557831), CD19 (Biolegend 302234), CD47 (BD 563761), CD56 (BD 555516), and CD235a (BD 559944) antibodies for 30 min at the manufacturer’s recommended concentration. Cells were pelleted (300g, 5 min, 4 °C), washed with FACS buffer twice, and then incubated with the viability marker propidium iodide and loaded onto a BD FACSAria II cell sorter. Living (propidium iodide-negative) single, non-red blood (CD235a−) cells were sorted into lysis plates along with specific immune populations: B cells (CD19+CD3−), CD8+ T cells (CD8+), CD4+ T cells (CD4+), natural killer cells (CD19−CD3−CD56+CD14−), classical monocytes (CD19−CD3−CD56−CD14+). After sorting, plates were quickly sealed, vortexed, spun down for 1 min at 1,000g, snap frozen on dry ice, and stored at −80 until cDNA synthesis.
Mouse cells were incubated with the viability marker DAPI and loaded onto a BD Influx cell sorter. Living (DAPI-negative) single cells were sorted into lysis plates based on presence or absence of the fluorescent lineage label (mEGFP for Axin2-cre-ERT2, ZsGreen1 for Tbx4-LME-cre).
Immune cells for bulk mRNA sequencing were incubated with FcR Block for 20 min and then stained with one of six panels of directly conjugated antibodies for 30 min at the manufacturers recommended concentration: anti-human CD16 (BD 558122), CD123 (BD 560826), CCR3 (R&D FAB155F), ITGB7 (BD 551082), CD3 (BD 555341), CD14 (Invitrogen MHCD1406), CD19 (BD 555414), and CD56 (BD 555517) (basophils, neutrophils and eosinophils); anti-human CD16 (BD 558122), CD14 (BD 347497), CD4 (BD 340443), CD3 (BD 555341), CD8 (BD 555368), CD19 (BD 555414), and CD56 (BD 555517) (classical and nonclassical monocytes); anti-human CD16 (BD 558122), CD1c (Miltenyi Biotec 130-098-007), CD11c (BD 340544), CCR3 (R&D FAB155F), CD123 (BD 560826), HLA-DR (BD 335796), CD3 (BD 555341), CD4 (BD 555348), CD8 (BD 555368), CD14 (Invitrogen MHCD1406), CD19 (BD 555414), and CD56 (BD 555517) (pDCs, mDCs, CD16+ dendritic cells); anti-human IgM/IgD (BD 555778), CD19 (BD 557835), CD27 (BD 558664), CD20 (BD 335794), CD3 (BD 555341), CD4 (BD 555348), CD14 (Invitrogen MHCD1406), and CD56 (BD 555517) (B cells); anti-human CD16 (BD 558122), CD57 (BD 347393), CD56 (BD 557747), CD3 (BD 555341), CD4 (BD 555348), CD14 (Invitrogen MHCD1406), and CD19 (BD 555414) (natural killer cells); and anti-human CD45RA (Biolegend 304118), CCR7 (R&D FAB197F), CD62L (BD 555544), CD45RO (BD Pharmingen 560608), CD4 (BD 340443), CD8 (BD 340584), CD11b (BD 555389), CD14 (Invitrogen MHCD1406), CD19 (BD 555414), CD56 (BD 555517) (T cells). Cells were washed with FACS buffer twice, incubated with the viability marker propidium iodide and loaded onto a BD FACSAria II cell sorter. Approximately 40,000 cells from 21 canonical immune populations (Supplementary Table 3) were sorted in duplicate into Trizol LS (Invitrogen 10296010).
After sorting, all plates and samples were quickly sealed, vortexed, spun down for 1 min at 1,000g and then snap frozen on dry ice and stored at −80 °C until cDNA synthesis.
Single-cell mRNA sequencing
mRNA from single cells sorted from human and mouse lungs and human blood into lysis plates was reverse transcribed to cDNA and amplified as previously described2. Illumina sequencing libraries for cDNA from single cells were prepared as previously described2. In brief, cDNA libraries were prepared using the Nextera XT Library Sample Preparation kit (Illumina, FC-131-1096). Nextera tagmentation DNA buffer (Illumina) and Tn5 enzyme (Illumina) were added, and the sample was incubated at 55 °C for 10 min. The reaction was neutralized by adding Neutralize Tagment Buffer (Illumina) and centrifuging at room temperature at 3,220g for 5 min. Mouse samples were then indexed via PCR by adding i5 indexing primer, i7 indexing primer, and Nextera NPM mix (Illumina). Human samples were similarly indexed via PCR using custom, dual-unique indexing primers (IDT)2.
Following library preparation, wells of each library plate were pooled using a Mosquito liquid handler (TTP Labtech), then purified twice using 0.7x AMPure beads (Fisher A63881). Library pool quality was assessed by capillary electrophoresis on a Tapestation system (Agilent) with either a high sensitivity or normal D5000 ScreenTape assay kit (Agilent) or Fragment analyser (AATI), and library cDNA concentrations were quantified by qPCR (Kapa Biosystems KK4923) on a CFX96 Touch Real-Time PCR Detection System (Biorad). Plate pools were normalized and combined equally to make each sequencing sample pool. A PhiX control library was spiked in at 1% before sequencing. Human libraries were sequenced on a NovaSeq 6000 (Illumina) and mouse libraries on a NextSeq 500 (Illumina).
Cells isolated from each compartment (immune and endothelial enriched, epithelial enriched, stromal) and subject blood were captured in droplet emulsions using a Chromium Single-Cell instrument (10x Genomics) and libraries were prepared using the 10x Genomics 3′ Single Cell V2 protocol as previously described2. All 10x libraries were pooled and sequenced on a NovaSeq 6000 (Illumina).
Immune cell bulk mRNA sequencing
Total RNA from bulk-sorted canonical immune populations was reverse transcribed to cDNA, amplified, and prepared as sequencing libraries as previously described45. Libraries were sequenced on a NextSeq 500 (Illumina).
Mouse and human lungs were collected as previously described25,46. After inflation, lungs were removed en bloc, fixed in 4% paraformaldehyde (PFA) overnight at 4 °C with gentle rocking, then cryo-embedded in Optimal Cutting Temperature compound (OCT, Sakura) and sectioned using a cryostat (Leica) onto Superfrost Plus Microscope Slides (Fisherbrand). Immunohistochemistry was performed using primary antibodies raised against the following antigens and used at the indicated dilutions to stain slides overnight at 4 °C: anti-proSP-C (rabbit, Chemicon AB3786, 1:250 dilution), HES1 (rabbit, Cell Signaling 11988S clone D6P2U, 1:100), MUC-1 (hamster, Thermo Scientific HM1630, clone MH1, 1:250), Ki67 (rat, DAKO M7249 clone MIB-1, 1:100), and keratin-5 (chicken, Biolegend 905901, 1:100). Primary antibodies were detected with Alexa Fluor-conjugated secondary antibodies (Jackson ImmunoResearch) unless otherwise noted, then mounted in Vectashield containing DAPI (5 μg ml−1, Vector labs). Images were acquired with a laser scanning confocal fluorescence microscope (Zeiss LSM780) and processed with Fiji (v.2.0) and Imaris (v.9.2.0, Oxford Instruments). Immunostaining experiments were performed on at least two human or mouse participants distinct from the donors used for sequencing, and quantifications were based on at least 10 fields of view in each.
Single molecule in situ hybridization
Samples were fixed in either 10% neutral buffered formalin, dehydrated with ethanol and embedded in paraffin wax or fixed in 4% paraformaldehyde and embedded in OCT compound. Sections from paraffin (5 μm) and OCT (20 μm) blocks were processed using standard pre-treatment conditions for each per the RNAscope multiplex fluorescent reagent kit version 2 (Advanced Cell Diagnostics) assay protocol. TSA-plus fluorescein, Cy3 and Cy5 fluorophores were used at 1:500 dilution. Micrographs were acquired with a laser scanning confocal fluorescence microscope (Zeiss LSM780) and processed with ImageJ and Imaris (version 9.2.0, Oxford Instruments). smFISH experiments were performed on at least two human or mouse participants distinct from the donors used for sequencing, and quantifications were based on at least 10 fields of view in each. For smFISH, fields of view were scored manually, calling a cell positive for each gene probed if its nucleus had at least three associated expression puncta. Proprietary (Advanced Cell Diagnostics) probes used were: KRT5 (547901-C2), SERPINB3 (828601-C3), SFTPC (452561-C2), WIF1 (429391), CLDN5 (517141-C2, 517141-C3), MYC (311761-C3), ACKR1 (525131, 525131-C2), COL1A2 (432721), GPC3 (418091-C2), SERPINF1 (564391-C3), C20rf85 (560841-C3), DHRS9 (467261), GJA5 (471431), CCL21 (474371-C2), COX4I2 (570351-C3), APOE (433091-C2), ACGT2 (828611-C2), ASPN (404481), IGSF21 (572181-C3), GPR34 (521021), EREG (313081), GPR183 (458801-C2), TREM2 (420491-C3), CHI3L1 (408121), MYRF (499261), AGER (470121-C3), TBX5 (564041), KCNK3 (536851), ACVRL1 (559221), SERPINA1 (435441), HHIP (464811), SLC7A10 (497081-C2), FGFR4 (443511), PI16 (451311-C2), SERPINF1 (310731), HHIP (448441-C3), SFTPC (314101-C2), NKX2-1 (434721-C3), and MYRF (524061).
Sequencing read alignments and quality control
Reads from single cells isolated using 10x chromium were demultiplexed and then aligned to the GRCh38.p12 human reference (from 10x Genomics) using Cell Ranger (version 2.0, 10x Genomics). Cells with fewer than 500 genes detected or 1,000 UMIs were excluded from further analyses.
Reads from single cells isolated by flow cytometry were demultiplexed using bcl2fastq (v.220.127.116.116, Illumina), pruned for low nucleotide quality scores and adaptor sequences using skewer (v.0.2.2), and aligned to either (depending on organism) the GRCh38.p12 human reference genome with both the gencode-vH29 and NCBI-108 annotations or the GRCm38.p6 mouse reference genome with the NCBI-106 annotation (with fluorescent genes mEGFP, tdTomato, and ZsGreen1 supplemented) using STAR (v.2.6.1d) in two-pass mapping mode, in which the first pass identifies novel splice junctions and the second pass aligns reads after rebuilding the genome index with the novel junctions. The number of reads mapping to each annotated gene were calculated by STAR during the second pass alignment, and cells with fewer than 500 genes detected or 50,000 mapped reads were excluded from later analyses. Reads from mRNA sequencing of canonical immune populations were demultiplexed, aligned and quantified using the same pipeline.
Cell clustering, doublet calling, and annotation
Expression profiles of cells from different subjects and different capture approaches (10x and SS2) were clustered separately using the R software package Seurat (v.2.3)47. In brief, counts (SS2) and UMIs (10x) were normalized across cells, scaled per million (SS2) or per 10,000 (10x), and converted to log scale using the ‘NormalizeData’ function. These values were converted to z-scores using the ‘ScaleData’ command and highly variable genes were selected with the ‘FindVariableGenes’ function with a dispersion cutoff of 0.5. Principle components were calculated for these selected genes and then projected onto all other genes with the ‘RunPCA’ and ‘ProjectPCA’ commands. Clusters of similar cells were detected using the Louvain method for community detection including only biologically meaningful principle components (see below) to construct the shared nearest neighbour map and an empirically set resolution, as implemented in the ‘FindClusters’ function.
When clustering all cells from a single subject at once, we found that the first principal components defining heterogeneity represented differences in tissue compartment, but some cell types within a compartment (for example, basal, goblet club, neuroendocrine and ionocyte) had a tendency to co-cluster. Clusters were therefore grouped based on expression of tissue compartment markers (for example, EPCAM, CLDN5, COL1A2 and PTPRC) using the ‘SubsetData’ command and the same procedure (from ‘ScaleData’ onwards) was applied iteratively to each tissue compartment until the markers enriched in identified clusters, identified using the ‘MAST’ statistical framework48 implemented in the ‘FindMarkers’ command, were no longer biologically meaningful (for example, clusters distinguished by dissociation-induced genes30, ribosomal genes, mitochondrial genes, or ambient RNA released by abundant cells such as RBCs31). Doublets were identified by searching for cells with substantial and coherent expression profiles from two or more tissue compartments and/or cell types.
To assign clusters identities, we first compiled a list of all established lung cell types, their abundances, their classical markers, and any RNA markers (when available) (Supplementary Table 1). RNA markers for canonical immune populations (Supplementary Table 3) were obtained from bulk mRNA sequencing by correlating the average expression (each captured in duplicate) with a test vector where the target population position equaled 10 and all others equaled 0 (see GitHub for details). Clusters were assigned a canonical identity based on enriched expression of these marker genes. Pearson correlations were calculated between the average expression profiles from each immune cluster for all cells in the SS2 with the average bulk profiles using the ‘cor’ function in R. There were no clusters that lacked expression of canonical marker genes. When two or more clusters were assigned the same identity, we first determined whether their tissue locations differed substantially (for example, proximal versus distal, alveolar versus adventitial) and prepended these locations when applicable. When both clusters localized to the same tissue region (for example, capillary endothelial cells or AT2 cells), we next compared their differentially expressed genes head-to-head to identify differences in molecular functions. These functional differences were also prepended, when applicable (for example, signalling AT2 versus AT2, proliferating basal versus basal). If the clusters could not be resolved by location or function, we prepended a representative marker gene to their ‘canonical’ identity (for example, IGSF21+ dendritic, EREG+ dendritic, and TREM2+ dendritic). Cells from different subjects with the same annotation were merged into a single group for all downstream analyses.
Approximately 35,000 mouse lung and blood cell expression profiles by SS2 and 10x from Tabula Muris Senis2 were combined with 522 cells isolated from Axin2-Cre-ERT2> Rosa26mTmG (A.N.N.) and Tbx4-LME-Cre > Rosa26ZsGreen1 (K.J.T.) mice and amplified by SS2. Cells were stratified by technology (10x versus SS2), re-clustered and re-annotated using the strategy described above for human lung cells.
Re-annotation of existing human lung single cell RNA sequencing datasets
UMI tables were obtained from the Gene Expression Omnibus (GSE122960 for ref. 18, GSE130148 for ref. 19), clustered, and annotated using the strategy described above. New annotations for each cell are available on GitHub (see below).
Cell type pairwise correlations
We obtained average expression profiles for each cell type from all cells in the 10x dataset, supplemented with the average expression profile from neutrophils in the SS2 dataset, and calculated pairwise Pearson correlation coefficients using the ‘cor’ function in R.
Identification of proliferation signature
Expression profiles from matched proliferating and quiescent cell types were compared head-to-head using the ‘MAST’ statistical framework implemented in the ‘FindMarkers’ command in Seurat. Differentially-expressed genes common in each proliferating cell type were converted to z-scores using the ‘ScaleData’ command in Seurat, and summed to create a proliferation score for each cell in the 10x dataset.
Identification of immune egression signatures
Blood and tissue expression profiles for each immune cell type were compared head-to-head using the ‘MAST’ statistical framework implemented in the ‘FindMarkers’ command in Seurat. Differentially-expressed genes common in each subject were screened for dissociation artefact and contamination by red blood cells. Genes specific to tissue immune cells were binned based on their breadth of expression (lymphocyte, myeloid or both), converted to z-scores using the ‘ScaleData’ command in Seurat, and summed to create an egression score for each cell in the 10x dataset.
Identification of enriched marker genes, transcription factors, and disease genes
Differentially expressed genes for each annotated cell type relative to the other cells within its tissue compartment were identified using the ‘FindMarkers’ command in Seurat with the ‘MAST’ statistical framework after downsampling each cell type to 100 (SS2) or 500 (10x) cells. To obtain the most sensitive and specific markers for each cell type, we ranked enriched genes, with a P value less than 10−5 and a sensitivity greater than 0.4, by their Matthews correlation coefficients (MCCs) calculated for each cell type from all cells in the 10x data set (numbers available in Supplementary Table 2). To measure the utility of using multiple markers in assigning cell identities, we calculated MCC scores for all possible combinations of each cell type’s top five marker genes.
Enriched genes were annotated as transcription factors or genes associated with pulmonary pathology based on lists compiled from The Animal Transcription Factor Database (http://bioinfo.life.hust.edu.cn/AnimalTFDB), The Online Mendelian Inheritance in Man Catalog (OMIM)49, and Genome Wide Association Studies (GWAS) obtained from the EMBL-EBI Catalog50 (EFO IDs 0000270, 0000341, 0000464, 0000571, 0000702, 0000707, 0000708, 0000768, 0001071, 0003060, 0003106, 0004244, 0004312, 0004313, 0004314, 0004647, 0004713, 0004806, 0004829, 0005220, 0005297, 0006505, 0006953, 0007627, 0007744, 0007944, 0008431, 0009369, 0009370; GO IDs 0031427, 0097366; Orphanet IDs 586 182098; log(p-value) < -20, statistical tests vary in indicated studies). Viral entry genes were obtained from Gene Ontology (GO:0046718) and then curated and associated with their cognate virus(es) based on literature citations available in our GitHub repository.
Cellular interaction and hormone target mapping
Interactions between cell types were predicted using CellPhoneDB (‘statistical_analysis’ method) with all cells in the SS2 dataset, as previously described6. For our targeted analyses, we curated the chemokine receptor-ligand interaction map and list of hormone receptors from an extensive literature search (available on GitHub, see below).
Human and mouse gene alignment, cell type correlation, and gene expression comparisons
The gene expression matrices from our human SS2 cells and the Tabula Muris Senis SS2 cells, supplemented with the 522 mouse cells from Axin2-creER > mTmG and Tbx4-Cre > ZsGreen1 described above, were collapsed to HomologyIDs obtained from the Mouse Genome Informatics database to enable direct comparison. We obtained mean expression profiles for each cell type from all cells in the SS2 dataset and calculated pairwise Pearson correlation coefficients using the ‘cor’ function in R. We defined species-specific gene expression as those enriched 20-fold in either direction (mouse > human or human > mouse) with a P value less than 10−5 (calculated by ‘MAST’ as above) from all cells for the indicated types in the SS2 dataset. Correlations and age-specific genes were obtained the same manner using all cells from 3-month and 24-month in the combined SS2 mouse dat set.
To compare the expression pattern of each gene across species we binarized genes as expressed (1) or not expressed (0) in each cell type’s average expression profile calculated from all mouse and human SS2 cells of the types compared above. A cell type ‘expressed’ a gene if the median of that gene’s non-zero expression values across the constituent cells was greater than the median of every non-zero expression value for all other genes plus or minus two standard deviations (varied in 0.25 increments) and if the percentage of cells within the cell type with non-zero expression values was greater than the median percent of non-zero expression values for all other genes plus or minus two standard decisions (varied in 0.25 increments). These cutoffs were varied independently to ensure genes were robustly categorized. We then ordered these gene vectors to match homologous cell types between species with at least five cells and combined them to a single vector for each gene (V = (a − b) + 2ab, in which a is the ordered human vector and b is the ordered mouse vector) that indicated for each cell type whether: Both mouse and human expressed the gene (2), only human (1), only mouse (−1), or neither (0). We then classified genes by the following: conserved if any element of V equaled 2 and all other elements equaled 0; type 2 if any element equaled 2 and any other equaled 1 or −1; not expressed if all elements equaled 0; type 3 if elements were both positive and negative; and type 1 if elements were either positive or negative and 0.
Statistics and reproducibility
All heat maps and plots with single cell expression data include every cell from indicated types (numbers available in Supplementary Table 2 for human and Supplementary Table 6 for mouse) for sequencing technology specified (SS2 or 10x), unless otherwise stated. Scatter plots were generated with ggplot2’s ‘geom_point’ function. Dot plots were generated using a modified version of Seurat’s ‘DotPlot’ function (available on GitHub). Violin plots were created with Seurat’s ‘VlnPlot’ function and show proportion of single cells at indicated expression levels. Box-and-whisker plots were generated with ggplot2’s ‘geom_boxplot’ function; lower and upper hinges correspond to first and third quartiles, whiskers extend from hinge to the largest or smallest value no further than 1.5 times the interquartile range. Data beyond whiskers are shown as outlying points. Correlations use Pearson’s coefficient. Differentially expressed genes were identified using the ‘MAST’ statistical framework48 implemented in Seurat’s ‘FindMarkers’ function. Immunostaining and smFISH experiments were performed on at least 2 human or mouse subjects distinct from the donors used for sequencing, and quantifications were based on at least 10 fields of view in each. For smFISH, fields of view were scored manually, calling a cell positive for each gene probed if its nucleus had at least three associated expression puncta. No statistical methods were used to predetermine sample size. The experiments were not randomized and investigators were not blinded to allocation during experiments and outcome assessment.
Further information on research design is available in the Nature Research Reporting Summary linked to this paper.
Counts/UMI tables, cellular metadata, Seurat objects, and scanpy objects are available on Synapse (https://www.synapse.org/#!Synapse:syn21041850). The data can be explored in a browser using cellxgene at https://hlca.ds.czbiohub.org/. Human sequencing data are available by data access agreement on the European Genome-phenome Archive (EGA) under accession EGAS00001004344. Use of human sequencing data are restricted to not for profit research only and requires approval or a waiver from requesting investigator’s institutional review board. Mouse sequencing data are available on the National Institute of Health’s Sequence Read Archive (SRA) under BioProject accession PRJNA632939. Source data are provided with this paper.
The code for demultiplexing counts/UMI tables, clustering, annotation, downstream analyses, and obtaining source data/generating figures that include single-cell expression data are available on GitHub (https://github.com/krasnowlab/HLCA).
Enge, M. et al. Single-cell analysis of human pancreas reveals transcriptional signatures of aging and somatic mutation patterns. Cell 171, 321–330.e14 (2017).
Tabula Muris Consortium. Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature 562, 367–372 (2018).
Han, X. et al. Mapping the mouse cell atlas by microwell-seq. Cell 173, 1307 (2018).
Zeisel, A. et al. Molecular architecture of the mouse nervous system. Cell 174, 999–1014.e22 (2018).
Saunders, A. et al. Molecular diversity and specializations among the cells of the adult mouse brain. Cell 174, 1015–1030.e16 (2018).
Vento-Tormo, R. et al. Single-cell reconstruction of the early maternal-fetal interface in humans. Nature 563, 347–353 (2018).
Young, M. D. et al. Single-cell transcriptomes from human kidneys reveal the cellular identity of renal tumors. Science 361, 594–599 (2018).
Aizarani, N. et al. A human liver cell atlas reveals heterogeneity and epithelial progenitors. Nature 572, 199–204 (2019).
Han, X. et al. Construction of a human cell landscape at single-cell level. Nature 581, 303–309 (2020).
Young, J. Malpighi’s “De pulmonibus.”. Proc. R. Soc. Med. 23, 1–11 (1929).
Gehr, P., Bachofen, M. & Weibel, E. R. The normal human lung: ultrastructure and morphometric estimation of diffusion capacity. Respir. Physiol. 32, 121–140 (1978).
Balis, J. U., Paterson, J. F., Paciga, J. E., Haller, E. M. & Shelley, S. A. Distribution and subcellular localization of surfactant-associated glycoproteins in human lung. Lab. Invest. 52, 657–669 (1985).
Hermans, C. & Bernard, A. Lung epithelium-specific proteins: characteristics and potential applications as markers. Am. J. Respir. Crit. Care Med. 159, 646–678 (1999).
Franks, T. J. et al. Resident cellular components of the human lung: current knowledge and goals for research on cell phenotyping and function. Proc. Am. Thorac. Soc. 5, 763–766 (2008).
Tang, F. et al. mRNA-Seq whole-transcriptome analysis of a single cell. Nat. Methods 6, 377–382 (2009).
Gawad, C., Koh, W. & Quake, S. R. Single-cell genome sequencing: current state of the science. Nat. Rev. Genet. 17, 175–188 (2016).
Treutlein, B. et al. Reconstructing lineage hierarchies of the distal lung epithelium using single-cell RNA-seq. Nature 509, 371–375 (2014).
Reyfman, P. A. et al. Single-cell transcriptomic analysis of human lung provides insights into the pathobiology of pulmonary fibrosis. Am. J. Respir. Crit. Care Med. 199, 1517–1536 (2019).
Braga, F. A. V. et al. A cellular census of human lungs identifies novel cell states in health and in asthma. Nat. Med. 25, 1153–1163 (2019).
Picelli, S. et al. Full-length RNA-seq from single cells using Smart-seq2. Nat. Protoc. 9, 171–181 (2014).
Blondel, V. D. et al. Fast unfolding of communities in large networks. J. Stat. Mech. 2008, P10008 (2008).
Howitt, M. R. et al. Tuft cells, taste-chemosensory cells, orchestrate parasite type 2 immunity in the gut. Science 351, 1329–1333 (2016).
Rock, J. R. et al. Notch-dependent differentiation of adult airway basal stem cells. Cell Stem Cell 8, 639–648 (2011).
Garcia, S. R. et al. Single-cell RNA sequencing reveals novel cell differentiation dynamics during human airway epithelium regeneration. Preprint at https://doi.org/10.1101/451807 (2018).
Nabhan, A. N., Brownfield, D. G., Harbury, P. B., Krasnow, M. A. & Desai, T. J. Single-cell Wnt signaling niches maintain stemness of alveolar type 2 cells. Science 359, 1118–1123 (2018).
Zacharias, W. J. et al. Regeneration of the lung alveolus by an evolutionarily conserved epithelial progenitor. Nature 555, 251–255 (2018).
Stan, R. V. et al. The diaphragms of fenestrated endothelia: gatekeepers of vascular permeability and blood composition. Dev. Cell 23, 1203–1218 (2012).
Gillich, A. et al. Capillary cell-type specialization in the alveolus. Nature 586, 785–789 (2020).
Tan, S. Y. S. & Krasnow, M. A. Developmental origin of lung macrophage diversity. Development 143, 1318–1327 (2016).
van den Brink, S. C. et al. Single-cell sequencing reveals dissociation-induced gene expression in tissue subpopulations. Nat. Methods 14, 935–936 (2017).
Zheng, G. X. Y. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).
Shiow, L. R. et al. CD69 acts downstream of interferon-α/β to inhibit S1P1 and lymphocyte egress from lymphoid organs. Nature 440, 540–544 (2006).
Mackay, L. K. et al. Hobit and Blimp1 instruct a universal transcriptional program of tissue residency in lymphocytes. Science 352, 459–463 (2016).
Moffitt, J. R. & Zhuang, X. RNA imaging with multiplexed error-robust fluorescence in situ hybridization (MERFISH). Methods Enzymol. 572, 1–49 (2016).
Wang, X. et al. Three-dimensional intact-tissue sequencing of single-cell transcriptional states. Science 361, eaat5691 (2018).
Eng, C. L. et al. Transcriptome-scale super-resolved imaging in tissues by RNA seqFISH. Nature 568, 235–239 (2019).
Huang, C. et al. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet 395, 497–506 (2020).
Limjunyawong, N., Fallica, J., Horton, M. R. & Mitzner, W. Measurement of the pressure-volume curve in mouse lungs. J. Vis. Exp. 52376, 52376 (2015).
Seeley, R. R., Stephens, T. D. & Tate, P. Essentials of Anatomy and Physiology 7th edn (2005).
Tabula Muris Consortium. A single cell transcriptomic atlas characterizes aging tissues in the mouse. Nature 583, 590–595 (2019).
van Amerongen, R., Bowman, A. N. & Nusse, R. Developmental stage and time dictate the fate of Wnt/β-catenin-responsive stem cells in the mammary gland. Cell Stem Cell 11, 387–400 (2012).
Greif, D. M. et al. Radial construction of an arterial wall. Dev. Cell 23, 482–493 (2012).
Muzumdar, M. D., Tasic, B., Miyamichi, K., Li, L. & Luo, L. A global double-fluorescent Cre reporter mouse. Genesis 45, 593–605 (2007).
Madisen, L. et al. A robust and high-throughput Cre reporting and characterization system for the whole mouse brain. Nat. Neurosci. 13, 133–140 (2010).
Moraga, I. et al. Tuning cytokine receptor signaling by re-orienting dimer geometry with surrogate ligands. Cell 160, 1196–1208 (2015).
Desai, T. J., Brownfield, D. G. & Krasnow, M. A. Alveolar progenitor and stem cells in lung development, renewal and cancer. Nature 507, 190–194 (2014).
Butler, A. et al. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).
Finak, G. et al. MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol. 16, 278 (2015).
Amberger, J. S. et al. OMIM.org: Online Mendelian Inheritance in Man (OMIM), an online catalog of human genes and genetic disorders. Nucleic Acids Res. 43, D789–D798 (2014).
Buniello, A. et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47 (D1), D1005–D1012 (2019).
We are grateful to the tissue donors and the clinical staff at Stanford Medical Center who made tissue collection possible, especially J. Benson and E. Chen. We are especially grateful to Jim Spudich who spurred this study. We also thank the Stanford Shared FACS Facility for their expertise and sorting services, especially L. Nichols and M. Weglarz; members of Chan Zuckerberg Biohub and Quake laboratory who supported this work, particularly A. McGeever, B.Yu, B. Jones and S. Kolluru; M. Kumar for discussions on annotation of stromal cells; and M. Petersen for illustrating the lung schematic (Fig. 1b) and C. Kao for help with figure formatting. Some computing for this project was performed on the Sherlock cluster; we thank Stanford University and the Stanford Research Computing Center for providing computational resources and support that contributed to the results. We thank J. Spudich and members of the Krasnow laboratory for discussions and comments on the manuscript, and A. Lozano for discussions on bioinformatic analyses. This work was supported by funding from the Chan Zuckerberg Biohub (S.R.Q.), the Howard Hughes Medical Institute, National Institutes of Health, and the Vera Moulton Wall Center for Pulmonary Vascular Disease (M.A.K.), and the Ludwig Cancer Center at Stanford (I.L.W.). K.J.T was supported by a Paul and Mildred Berg Stanford Graduate Fellowship. M.A.K. is an investigator of the Howard Hughes Medical Institute.
The authors declare no competing interests.
Peer review information Nature thanks Shalev Itzkovitz and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 Strategy for scRNA-seq and annotation of human lung and blood cells.
a, Workflow for capture and mRNA sequencing of single cells from the healthy unaffected regions indicated (D, distal; M, medial; P, proximal lung tissue; see d) of fresh, surgically resected lungs with focal tumours from three participants (1, 2 and 3) and their matched peripheral blood. Cell representation was balanced among the major tissue compartments (endothelial, immune, epithelial and stroma) by magnetic and fluorescence activated cell sorting (MACS and FACS) using antibodies for the indicated surface markers (CD31, CD45, EPCAM). Cell capture and scRNA-seq was done using 10x droplet technology or SS2 analysis of plate-sorted cells. Number of profiled cells from each compartment are shown in parentheses. For blood, immune cells were isolated on a high density Ficoll gradient, and unsorted cells profiled by 10x and sorted cells (using canonical markers for the indicated immune populations) by SS2. Total cell number (all three participants) and median number of expressed genes per cell are indicated for each method. b, Cell clustering and annotation pipeline. Cell expression profiles were computationally clustered by nearest-neighbour relationships and clusters were then separated into tissue compartments based on expression of compartment-specific markers (EPCAM (blue), CLDN5 (red), COL1A2 (green), and PTPRC (purple)), as shown for t-distributed stochastic neighbour embedding (t-SNE) plot of lung and blood cell expression profiles obtained by 10x from participant 3. Cells from each tissue compartment were then iteratively re-clustered until differentially-expressed genes driving clustering were no longer biologically meaningful. Cell cluster annotation was based on expression of canonical marker genes from the literature, markers found through RNA sequencing of purified cell populations (bulk RNA markers), ascertained tissue location, and inferred molecular function from differentially-expressed genes. c, Heat map of pairwise Pearson correlations of the average expression profile of each cluster in the combined 10x dataset plus SS2 analysis of neutrophils. n values are in Supplementary Table 2. Tissue compartment and identification number of each of the 58 clusters are indicated. For more details on statistics and reproducibility, see Methods. d, Representative micrographs of donor lungs from formalin-fixed, paraffin-embedded sections stained with haematoxylin and eosin showing bronchi, bronchioles, submucosal glands, arteries, veins and alveoli near regions used for scRNA-seq. Staining repeated on at least five sections (encompassing different anatomical regions) from each participant used for scRNA-seq. Scale bar, 100 μm.
Extended Data Fig. 2 Selectively expressed RNA markers of human immune cell types from bulk mRNA sequencing of FACS-purified immune cells.
a, Heat map of RNA expression of the most selectively-expressed genes from bulk mRNA sequencing of the indicated FACS-sorted immune populations (Supplementary Table 3). This dataset provided RNA markers for human immune cell populations that have been classically defined by their cell surface markers. b, Heat map of pairwise Pearson correlation scores between the average expression profiles of the immune cell types indicated that were obtained from bulk mRNA sequencing (BulkSeq, a) to the average scRNA-seq profiles of human blood immune cells in the SS2 dataset annotated by canonical markers and enriched RNA markers from the bulk RNA-seq analysis. The highest correlation in overall gene expression (white dot) of each annotated immune cell cluster in the SS2 dataset (columns) was to the bulk RNA-seq of the same FACS-purified immune population (rows), supporting the scRNA-seq immune cluster annotations (red squares). Cell numbers are in Supplementary Table 2. For more details on statistics and reproducibility, see Methods.
Extended Data Fig. 3 Expression differences and localization of lung cell states and canonical epithelial and endothelial subtypes.
a, Proliferative signature score (based on expression of indicated genes in cells from 10x dataset; cell numbers are in Supplementary Table 2) of each cluster of basal cells, T and natural killer cells, and macrophages. Three clusters had high scores: proliferating basal cells (Bas-p), proliferating natural killer/T cells (NK/T-p), and proliferating macrophages. b, Dot plot of mean level of expression (dot intensity, grey scale) of indicated basal cell markers and percent of cells in population with detected expression (dot size) for 10x dataset. Note partial overlap of markers among different basal populations. c, Immunostaining of adult human pseudostratified airway for differentiation marker HES1 (green) in basal cells (marked by KRT5, red) with DAPI (nuclear) counter stain (blue). Scale bars, 10 μm. Note apical processes extending from HES1+ basal cells (arrowheads) indicating migration away from basal lamina as they differentiate. Other HES1+ cells have turned off basal marker KRT5. Dashed outlines, basal cell nuclei. Quantification shows fraction of basal cells (cuboidal KRT5+ cells on basement membrane) and differentiating basal (Bas-d) cells (KRT5+ cells with apical processes) that were HES1+. n denotes KRT5+ cells scored in sections of two human lungs with staining repeated on four participants. d, Immunostaining of adult human pseudostratified airway for proliferation marker MKI67 (green) in basal cells (marked by KRT5, red) with DAPI counter stain (blue). Scale bars, 5 μm. Quantification shows abundance of proliferating (MKI67-expressing) basal cells in pseudostratified (pseudo) and simple epithelial airways; n denotes KRT5+ cells scored in sections of two human lungs with staining repeated on four participants. e, Relative abundance of epithelial and stromal cell types in scRNA-seq analysis of human lung samples obtained from proximal (blue; 10x cells from P3) and distal (red; 10x cells from D1a, D1b, D2, D3) lung sites. In addition to the expected proximal enrichment of some airway cell types (goblet cells, ionocytes, neuroendocrine cells) and distal enrichment of alveolar cell types (AT1, AT2, AT2-signalling, myofibroblasts), note three bracketed pairs of related cell types (ciliated and proximal ciliated; basal and proximal basal (Bas-px) cells; myofibroblasts and fibromyocytes) with one of them proximally enriched. Relative enrichment values are provisional because they can be influenced by efficiency of collection during cell dissociation and isolation. Cell number for proximal cells are (from left to right): 357, 275, 73, 175, 153, 191, 39, 145, 57, 24, 20, 10, 328, 1,505, 235, 25 and 70; and for distal cells are: 537, 806, 15, 197, 4, 58, 6, 14, 336, 0, 2, 1, 467, 2,095, 434, 198 and 28. f, RNAscope smFISH and quantification for general basal marker KRT5 (red) and proximal basal cell marker SERPINB3 (white) with DAPI counter stain (blue) and ECM autofluorescence (green) on proximal, pseudostratified bronchi and distal, simple bronchioles. Scale bars, 20 μm (inset, 10 μm). Note enrichment of proximal basal cells (KRT5 SERPINB3 double positive, yellow arrowhead and box) enrichment at base of pseudostratified airways. SERPINB3 was not detected in simple airways, indicating that basal cells (but not proximal basal cells) are present there. Staining repeated on two participants. g, Dot plot of expression in ciliated and proximal ciliated cells of canonical (general) ciliated cell markers and specific proximal ciliated markers (in 10x dataset). h, smFISH and quantification of human pseudostratified epithelial (left) and simple epithelial (right) airways for general ciliated marker C20orf85 (white) and proximal ciliated marker DHRS9 (red) with DAPI counterstain (blue) and ECM autofluorescence (green). Note restriction of proximal ciliated cells to pseudostratified airways. Scale bars, 10 μm. Staining repeated on two particpants. i, Heat map of expression of representative general AT2, AT2 selective, and AT2-signalling selective marker genes in AT2 and AT2-signalling human lung cells (SS2 data). AT2 selective markers include negative regulators of Hedgehog and Wnt signalling pathways (for example, HHIP and WIF1, highlighted red) and AT2-signalling selective markers include Wnt ligands, receptors and transcription factors (for example, WNT5A, LRP5 and TFC7L2 highlighted green). Values shown are ln(CPM + 1) for 50 randomly selected cells in each cluster (SS2 data). j, Dot plot of expression of endothelial markers (10x dataset). k, Micrograph (low magnification, left) of bronchial vessel (boxed region) showing vessel location near airway (dotted outline). smFISH for general endothelial marker CLDN5 (red, centre), bronchial vessel-specific markers MYC (green) and Bro1-specific marker ACKR1 (red, right) on serial sections of bronchial vessel cells (arrowheads), co-stained for DAPI (blue). Scale bar, 10 μm. Quantification shows relative abundance of Bro1 and Bro2 cells. Staining repeated on two participants. l–n, smFISH and quantification of vessel types indicated (dotted outlines) showing vein marker ACKR1 (red; l), artery marker GJA5 (red; m), lymphatic marker CCL21 (red; n), and general endothelial marker CLDN5 with DAPI counter stain (blue) and ECM autofluorescence (green). Scale bars, 50 μm (l), 30 μm (m) and 40 μm (n). Staining repeated on two participants. For more details on statistics and reproducibility, see Methods.
Extended Data Fig. 4 Markers and lung localization of stromal and dendritic subtypes.
a–d, smFISH for RNA of indicated marker genes of alveolar fibroblasts (a, b) and adventitial fibroblasts (c, d) in adult human (a, c) and mouse (b, e) alveolar (a, b) and pulmonary artery (c, d) sections. ECM autofluorescence (green; a, c) to show blood vessels; Elastin (green, b, d); DAPI counterstain (blue, all panels). Staining repeated on two human participants or three mice. a, smFISH probes: general fibroblast marker COL1A2 (white) and alveolar fibroblast-selective marker GPC3 (red). Arrowheads denote alveolar fibroblasts. Inset, close-up of boxed region showing merged (top) and split channels of an alveolar fibroblast. Scale bars, 20 μm (inset 60 μm). b, smFISH probes: alveolar fibroblast-selective markers Slc7a10 (white) and Frfr4 (red). Elastin (green) shows alveolar entrance ring. Arrowheads denote alveolar fibroblasts. Scale bar, 5 μm. c, smFISH probes: general fibroblast marker COL1A2 (white) and adventitial fibroblast -selective marker SERPINF1 (red). Adventitial fibroblasts (some indicated by arrowheads) localize around blood vessels (ECM, green). Inset, close-up of boxed region showing merged (top) and split channels of an adventitial fibroblast. Dashed line denotes the artery boundary. Scale bars, 30 μm (inset 90 μm). d, smFISH probes: adventitial fibroblast-selective markers Pi16 (white) and Serpinf1 (red). Adventitial fibroblasts (arrowheads) surround artery (marked by elastin, green). Scale bar, 10 μm. e, Heat map of expression of representative general, adventitial-selective, and alveolar-selective fibroblast markers in 50 randomly selected cells from adventitial (left) and alveolar (right) fibroblast clusters (SS2 dataset). Note specialization (highlighted red) in growth factors (AdvF: PDGFRL, IGFBP4; AlvF: FGFR4, VEGFD) and morphogen (AdvF: SFRP2; AlvF: NKD1, DKK3) signalling or regulation. f, g, smFISH and quantification of cell abundance in human alveolar (f) and pseudostratified epithelial airway (g) sections probed for myofibroblast and fibromyocyte marker ASPN (red), and for fibromyocyte and airway smooth muscle markers COX4I2 (white; f) and ACTG2 (white; g). ECM autofluorescence, green; DAPI counterstain, blue. Inset (f), boxed region showing close-up of merged (top) and split channels of ASPN+ COX4I2− myofibroblast. Myofibroblasts and fibromyocytes (see below) probably make up remaining cells in Fig. 1f quantification. Inset (g), boxed regions showing close-up of merged (top) and split channels of fibromyocyte (white box) and airway smooth muscle (yellow box) cells. Fibromyocytes (white arrowheads) and airway smooth muscle (yellow arrowheads) are intermingled in wall of pseudostratified airway (dotted outline). Staining repeated on two participants. h, i, smFISH of human alveolar sections probed for general stromal marker COL1A2 (white), pericyte marker COX4I2 (red; h), lipofibroblast marker APOE (red; i). ECM autofluorescence, green; DAPI counterstain, blue. Inset (h), boxed region showing close-up of pericyte. Inset (i), boxed region showing close-up of COL1A2 APOE double-positive LipF. LipF cells are intermingled among other stromal cells (single-positive COL1A2) and macrophages (single-positive APOE). Quantification in Fig. 1f. Scale bars, 20 μm. Staining repeated on two participants. j, Dot plot of COX4I2 expression in alveolar stromal cell types (10x dataset). k, Heat map of expression of dendritic cell marker genes in 50 randomly selected cells from indicated dendritic cell clusters (human blood and lung 10x datasets). Cells in all clusters express general dendritic markers including antigen presenting genes but each cluster also has its own selective markers. Red highlighted markers distinguishing the newly identified dendritic cell clusters (IGSF21+, EREG+, TREM2+) suggest different roles in asthma (IGSF21+), growth factor regulation (EREG+), and lipid handling (TREM2+). l–n, smFISH of adult human lung proximal and alveolar (Alv) sections as indicated probed for IGSF21+ dendritic cell markers IGSF21 (red) and GPR34 (white) (l), EREG+ dendritic cell marker EREG (red) and general dendritic cell marker GPR183 (white) (m), and TREM2+ dendritic cell markers TREM2 (red) and CHI3L1 (white) (n). DAPI counterstain, blue. Non-punctate signal in red channel (l, n) is erythrocyte autofluorescence. Insets, boxed regions showing merged and split channels of close-up of single dendritic cell of indicated type. Scale bars, 20 μm. Arrowheads denote double-positive cells. Quantification shows distribution of each dendritic type; note IGSF21+ and EREG+ dendritic cells show strong proximal enrichment. Staining repeated on two participants. o, t-SNE of expression profile clusters of monocytes and B, T and natural killer cells (10x dataset, participant 1, 2,622 cells). Note separate cell clusters of each immune cell type isolated from lung (no outline) and blood (dashed outline). Asterisk denotes small number of B cells isolated from the lung that cluster next to blood B cells. For more details on statistics and reproducibility, see Methods.
Extended Data Fig. 5 Markers and transcription factors that distinguish human lung cell types.
a, Violin plots of expression levels (ln(UP10K + 1)) of the most sensitive and specific markers (gene symbols) for each human lung cell type in its tissue compartment (10x dataset). Cell numbers given in Supplementary Table 2. b, Scheme for selecting the most sensitive and specific marker genes for each cell type using Matthews correlation coefficient (MCC). Box-and-whisker plots below show MCCs, true positive rates (TPR), and false discovery rates (FDR) for each cell type (n = 58) using indicated number (nGene) of the most sensitive and specific markers (10x dataset). Note all measures saturate at approximately 2–4 genes, hence simultaneous in situ probing of a human lung for the approximately 100–200 optimal markers would assign identity to nearly every cell. c, Alveolar section of human lung probed by smFISH for AT1 marker AGER and transcription factor MYRF. MYRF is selectively expressed in AT1 cells (arrowheads; 97% of MYRF+ cells were AGER+, n = 250 scored cells). Inset, boxed region showing merged and split channels of AT1 cell. Scale bar, 10 μm. Staining repeated on two participants. d, Alveolar section of human lung probed by smFISH for pericyte marker COX4I2 and transcription factor TBX5. TBX5 is enriched in pericytes (arrowheads, 92% of TBX5+ cells were COX4I2+, n = 250). Inset, boxed region showing merged and split channels of pericyte. Scale bar, 5 μm. Staining repeated on two participants. e, Dot plot of expression of enriched transcription factors in each lung cell type (SS2 dataset). Red text, genes not previously associated with the cell type. Red shading, transcription factors including MYRF that are highly enriched in AT1 cells, and TBX5 and others highly enriched in pericytes. For more details on statistics and reproducibility, see Methods.
Extended Data Fig. 6 Lung cell targets of circulating hormones and local signals.
a, Dot plot of hormone receptor gene expression in lung cells (SS2 dataset). Type and name of cognate hormones for each receptor are shown at top. Teal, broadly-expressed receptors in lung; other colours, selectively-expressed receptors (<3 lung cell types). Small coloured dots next to cell type names show selectively targeted cell types. AA, amino acid; AM, adrenomedullin; CGRP, calcitonin gene-related peptide; EPO, erythropoietin; GCCT, glucocorticoid; GH, growth hormone; GIP, gastric inhibitory peptide; IGF, insulin-like growth factor; MCCT, mineralocorticoid; RA, retinoic acid; SST, somatostatin. b, Schematic of inferred pericyte cell contractility pathway and its regulation by circulating hormones (AGT, PTH) and capillary expressed signals (EDN, NO). Dots show expression of indicated pathway genes: values at left (outlined red) in each pair of dots in capillary diagram (top) show expression in Cap-a cells (aerocytes) and at right (outlined blue) show expression in general Cap cells (SS2 dataset). Note most signal genes are preferentially expressed in Cap relative to Cap-a cells. c, Heat maps showing number of interactions predicted by CellPhoneDB software between human lung cell types located in proximal lung regions (left panel in each pair) and distal regions (right) based on expression patterns of ligand genes (‘sending cell’) and their cognate receptor genes (‘receiving cell’) (SS2 dataset). The pair of heat maps at the top left show values for all predicted signalling interactions (‘all interactions’), and other pairs show values for the indicated types of signals (growth factors, cytokines, integrins, WNT, Notch, BMP, FGF and TFGβ). Predicted interactions between cell types range from 0 (lymphocyte signalling to neutrophils) to 136 (AdvF signalling to Cap-i1). Note expected relationships, such as immune cells expressing integrins to interact with endothelial cells and having higher levels of cytokine signalling relative to their global signalling, and unexpected relationships, such as fibroblasts expressing most growth factors and lack of Notch signalling originating from immune cells. For more details on statistics and reproducibility, see Methods.
Extended Data Fig. 7 Lung cell expression patterns of genes implicated in lung disease.
Dot plots of expression (in SS2 dataset) of 233 lung disease genes curated from genome-wide association studies (GWAS; genome-wide association genes ≥ 10−20 significance) and Online Mendelian Inheritance in Man (OMIM). For more details on statistics and reproducibility, see Methods.
Extended Data Fig. 8 Mapping cellular origins of lung disease by cell-selective expression of disease genes.
a, Dot plots of expression of lung disease genes (numbered, associated disease shown above) enriched in specific lung cell types (SS2 datasets). Red, novel cell type association of gene or disease; grey, diseases with developmental phenotype. AWS, Alagille–Watson syndrome; BBS, Bardet–Biedl syndrome; CF, cystic fibrosis; Dys, dysplasia; EDS, Ehlers-Danlos syndrome; Fam Med, familial Mediterranean; IPF, idiopathic pulmonary fibrosis; PH, pulmonary hypertension; SGB, Simpson-Golabi-Behmel; SM, smooth muscle; SMD, surfactant metabolism dysfunction; TB, tuberculosis; VDES, Van den Ende-Gupta syndrome. b, Dot plot of expression (SS2 dataset) of all genes implicated in pulmonary hypertension, tuberculosis and COPD or emphysema (OMIM, Mendelian disease genes from OMIM database; GWAS, genome-wide association genes ≥ 10−20 significance). Note canonical AT2 cells (red shading) express all and AT2-signalling cells (blue shading) express most. c, smFISH of alveolar section of adult human lung probed for in pulmonary hypertension disease gene KCNK3 (red) and pericyte marker COX4I2 (white) with DAPI counterstain (blue) and ECM autoflourescence (green). Note pericyte-specific expression (arrowheads, 91% of COX4I2+ pericytes were KCNK3+, n = 77). Scale bar, 5 μm. Cell numbers for each type given in Supplementary Table 2. d, smFISH of alveolar section of adult human lung probed for atrioventricular (AV) dysplasia gene ACVRL1 (red), endothelial marker CLDN5 (white) with DAPI counterstain. Note ACVRL1 CLDN5 double-positive capillaries (white arrowheads, 70% of CLDN5+ capillaries were ACVRL1+, n = 102) and some CLDN5 single-positive capillaries (yellow arrowheads). Scale bar, 5 μm. e, smFISH of alveolar section of adult human lung probed for COPD or emphysema gene SERPINA1 and AT2 marker SFTPC, and DAPI. Note AT2-specific expression (arrowheads; 93% of AT2 cells were SERPINA1+, n = 176). Scale bar, 5 μm. For more details on statistics and reproducibility, see Methods.
Extended Data Fig. 9 Lung cell expression patterns of respiratory virus receptors.
a, Dot plot showing expression in human lung cell types of entry receptors (indicated at left) for respiratory viruses (indicated at right, numbers indicate viral families) (SS2 dataset). Red shading, cell types inhaled viruses could directly access (epithelial cells and macrophages); darker red shading shows expression values for measles receptor NECTIN4 and rhinovirus C receptor CDHR3. b, Violin plots (left) and dot plots (immediately above violin plots) showing expression of coronavirus receptors ACE2, DPP4, and ANPEP in lung cell types (10x dataset, cell numbers given in Supplementary Table 2). Grey shading, cell types inhaled viruses can directly access. Doughnut plots (right) showing relative number of receptor-expressing cells of cell types viruses can directly access (shaded grey in a), normalized by their abundance values from Supplementary Table 1 (and refined by the relative abundance values in Fig. 1 and Extended Data Figs. 3 and 4). Note prevalence of AT2 alveolar cells for ACE2, receptor for SARS-CoV and SARS-CoV-2, and for DPP4, receptor for MERS-CoV, in contrast to prevalence of macrophages for ANPEP, receptor for common cold causing coronavirus 229E. For more details on statistics and reproducibility, see Methods.
Extended Data Fig. 10 Lung cell expression patterns of non-respiratory virus receptors.
Extended Data Fig. 11 Comparison of mouse and human gene expression profiles in homologous lung cell types and across age.
a, Scatter plots showing median expression levels (ln(CPM + 1)) in indicated cell types of each expressed human gene and mouse orthologue (mouse and human SS2 datasets, human and mouse cell numbers given in Supplementary Tables 2 and 6, respectively). Note tens to hundreds of genes that show a 20-fold or greater expression difference (and P < 0.05, MAST) between species (red dots, gene names indicated for some and total number given above). Basophil/mast cell 1 (Bas/Ma 1) cells have the most differentially expressed genes (343), and CD4+ M/E T cells have the least (79). Pearson correlation scores (R values) between the average mouse and human gene expression profiles for each cell type are indicated. ‘Mm()’ and ‘Hs()’ denotes genes in which duplications between mouse and human were collapsed to HomologyID. b, Heat map showing global transcriptome Pearson correlation between indicated human and mouse epithelial cells (SS2 dataset, human and mouse cell numbers given in Supplementary Tables 2 and 6, respectively). Red outline denotes homologous cell types based on classical markers described in Supplementary Table 6. White dot denotes human-to-mouse correlation. c, Dot plot of expression of canonical goblet cell markers MUC5B and MUC5AC and transcription factor SPDEF in mouse (left) and human (right) goblet cells. d, Scatter plot showing average expression levels (dots) across all cells (‘pseudo-bulk’ lung expression) of each expressed human gene and mouse orthologue (mouse and human SS2 datasets). Scale, ln(CPM + 1). Pearson correlation (R values) between the average mouse and human gene expression profiles are indicated. e, Scatter plots comparing median expression levels (ln(CPM + 1)) in indicated mouse lung cell types of each expressed gene at age 3 months (x axis) and 24 months (y axis) in SS2 datasets from Tabula Muris Senis40 (cell numbers given in Supplementary Table 6). Pearson correlation scores between average gene expression profile for each cell type at each age are indicated (R values), along with number of genes (red dots) showing 20-fold or greater expression difference (and P < 0.05, MAST) between ages. Names of some genes are given next to the corresponding red dot. For more details on statistics and reproducibility, please see Methods.
Extended Data Fig. 12 Patterns of conserved and divergent gene expression across human and mouse lung cell types.
a, Dot plots of PTPRC and MYL6 expression in mouse and human lung cell types (SS2 datasets) showing two examples of conserved (type 0) expression pattern. Blue shading, homologous cell types with conserved expression. b, Dot plots showing gain of expression (type 1 change) in several human cell types of RNASE1 (left) and all human cell types of TRIM38 (right). Red shading, cell types with divergent (gained) expression. c, Alveolar section of adult mouse lung probed by smFISH for general alveolar epithelial marker Nkx2-1, AT2 marker Sftpc, and transcription factor Myrf. Note Myrf is selectively expressed in mouse AT1 cells (Nkx2-1+ Sftpc− cells), as it is in humans (Extended Data Fig. 6c). Scale bar, 5 μm. Staining repeated on three mice. d, Dot plots of expression of CGRP and ADM hormone receptor genes showing expansion of expression (type 2 change) in human endothelial cells (10x data sets). e, Dot plots of expression of emphysema-associated gene SERPINA1 showing switched expression (type 3 change) from mouse pericytes (top) to human AT2 cells (bottom) (SS2 datasets). f, Dot plots comparing expression and conservation of HHIP with those of other Hedgehog pathway genes including ligands (SHH, DHH, IHH), receptors (PTCH1, PTCH2, SMO), and transducers (GLI1, GLI2, GLI3) (SS2 datasets). g, Dot plots of expression of serous cell markers LTF, LYZ, BPIFBP1 and HP showing switched expression (type 3 change) from mouse airway epithelial cells to human serous cells, which mice lack (asterisk). Dot plots of expression of lipid handling genes APOE, PLIN2 and FST show switched expression (type 3 change) from mouse alveolar stromal cells to human lipofibroblasts, which mice lack (asterisks). ‘Mm()’ or ‘Hs()’, genes in which duplications between mouse and human were collapsed to HomologyIDs (10x and SS2 datasets). h, Pie chart of fraction of expressed genes in lung showing each of the four types of evolutionary changes in cellular expression patterns from mouse to human. Histogram below shows number of lung cell types that the 602 genes with perfectly conserved cellular expression patterns (type 0) are expressed in; note that almost all are expressed in either a single cell type (67%) or nearly all cell types (33%). For more details on statistics and reproducibility, see Methods.
Supplementary Figure 1. FACS gating strategies for human lung and peripheral blood cells. a, Sequential FACS data and sorting gates (red) for dissociated human lung cells from subject sample D1b (plate B001223) following MACS depletion of highly abundant immune (CD45+) and endothelial (CD31+) cells. The final sort (right) was of viable single cells from the lung epithelial (EPCAM+CD45-), immune (CD45+EPCAM-), and stromal/endothelial (EPCAMCD45-) compartments into 384-well plates for SS2 scRNAseq. Plots are representative of FACS repeated on 3 subjects used for scRNAseq. b, Sequential FACS data and sorting gates (red) for white blood cells isolated on a Ficoll gradient of matched subject peripheral blood (subject 1, plate BP1). Viable, single CD235a- (non-RBC) cells were captured without additional gating (panel 4), or further sorted as CD8 T (CD8+; panel 8), CD4 T (CD4+; panel 7), B (CD19+CD3-; panel 6), NK (CD19-CD3-CD56+CD14-; panel 9), or CD14+ monocytes (CD19-CD3-CD56- CD14+; panel 9) for SS2 scRNAseq. Contours, 5% increments in cell density. Sorting on blood cells was done only for subject 1.
Supplementary Table 1. Canonical cell types (45) in the human lung and their abundances, markers, and available expression data. a, numbers of each type were calculated with their abundances and the total number of lung cells (estimated by comparing volume of lungs to the whole body). b, Canonical markers were obtained from referenced expression data or commonly used markers in the literature. c, Expression profiles captured immediately following tissue dissociation are considered primary. d, Alveoli were assumed to occupy ~90% of the total lung volume for all estimations. e, Inferred from mean relative abundance in proximal, medial and distal airway epithelium. f, Calculated by stereology g, Resin casts showed similar surface area of arteries and veins. h, Vascular smooth muscle is estimated to be slightly more abundant than airway smooth muscle. i, abundance of a more general cell type was split evenly. j, inferred from impression of light or electron microscopy. k, inferred from histological abundance in nonperfused healthy tissue. l, inferred from abundance among immune cells with FACS. m, Calculated using microfluidic capture.
Supplementary Table 2. Human lung cell cluster identities, abundances, and locations. Cell numbers are stratified by type, subject, and sequencing technology. Cell abbreviations indicated are used throughout.
Supplementary Table 3. Surface markers used to isolate canonical immune cell types in bulk mRNA sequencing. See Methods for details on antibodies.
Supplementary Table 4. Enriched markers found in each cluster, with transcription factors, receptors/ligands, and disease associated genes annotated. Includes all enriched genes for each type (p-val > 0.05, MAST, downsampled cells per group to 100 for SS2 clusters or 500 for 10x clusters). Abbreviations: avg_logFC, the natural log of the average fold change between the cell type and other cell types in its tissue compartment; pct_in_cluster, percentage of cells within the cluster that express the gene; pct_out_cluster, percentage of cells outside cluster that express the gene; p_val_adj, p-value with Bonferroni correction applied; TF, transcription factor; OMIM, Online Mendelian Inheritance in Man; GWAS, genome wide association study.
Supplementary Table 5. P-value and scores of each CellPhoneDB Receptor-Ligand interaction from each cluster. Expression scores are given for significant (p > 0.1, CellPhoneDB statistical framework) pairwise interactions between all SS2 cells (cell numbers given in Supplementary Table 2). Table includes all 1,085 receptor-ligand pairs and all pairwise cell type combinations. Abbreviations: partner_a, cellphonedb ID for the first interaction partner protein; partner_b, cellphonedb ID for the second interaction partner protein; source, reference from cellphonedb; secreted, whether the ligand is secreted or membrane bound; is_integrin, whether the interaction includes an integrin complex.
Supplementary Table 6. Mouse lung cell cluster identities and their abundances in each dataset. Cell numbers are stratified by type, mouse, and sequencing technology. Mouse-specific molecular markers are indicated.
Supplementary Table 7. Genes specific to mouse and human in each cluster and lung wide. Includes all differentially expressed genes (p-val > 0.05, MAST) for all SS2 cells in indicated mouse and human clusters (human and mouse cell numbers given in Supplementary Tables 2 and 6, respectively). Abbreviations: avg_logFC, the natural log of the average fold change between the mouse and human cell type indicated; pct_mouse, percentage of mouse cells within the cluster that express the gene; pct_human, percentage of human cells within the cluster that express the gene; p_val_adj, p-value with Bonferroni correction applied; enriched, gene is enriched in cluster in mouse or human.
Supplementary Table 8. Evolutionary changes in cellular patterns of lung gene expression between mouse and human SS2 cells. Numbers of genes and percentages are mean (±SEM) from using a median expression cutoff minus 0 to 2 standard deviations (0.25 increments) and a median percent cutoff plus 0 to 2 standard deviations (0.25 increments). Human and mouse cell numbers given in Supplementary Tables 2 and 6, respectively.
Supplementary Table 9. Evolutionary and functional classes of genes. Includes genes conserved between mouse and human. Abbreviations: Evo type, evolutionary scenario gene falls into, with NA for genes not expressed; Gene class, type of gene (receptor, ligand, enzyme, transcription factor, etc); conserved clusters, number of homologous clusters gene where gene is expressed in both mouse and human; Human specific clusters, number of homologous clusters where the gene is only expressed in human; Mouse specific clusters, above for mouse.
Rights and permissions
About this article
Cite this article
Travaglini, K.J., Nabhan, A.N., Penland, L. et al. A molecular cell atlas of the human lung from single-cell RNA sequencing. Nature 587, 619–625 (2020). https://doi.org/10.1038/s41586-020-2922-4
This article is cited by
Early events marking lung fibroblast transition to profibrotic state in idiopathic pulmonary fibrosis
Respiratory Research (2023)
Convergence of YAP/TAZ, TEAD and TP63 activity is associated with bronchial premalignant severity and progression
Journal of Experimental & Clinical Cancer Research (2023)
Mechanisms of pulmonary disease in F344 rats after workplace-relevant inhalation exposure to cross-linked water-soluble acrylic acid polymers
Respiratory Research (2023)
Gene expression profiles in mesenchymal stromal cells from bone marrow, adipose tissue and lung tissue of COPD patients and controls
Respiratory Research (2023)
Lung development and regeneration: newly defined cell types and progenitor status
Cell Regeneration (2023)
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.