Characterization of the COPD alveolar niche using single-cell RNA sequencing

Chronic obstructive pulmonary disease (COPD) is a leading cause of death worldwide, however our understanding of cell specific mechanisms underlying COPD pathobiology remains incomplete. Here, we analyze single-cell RNA sequencing profiles of explanted lung tissue from subjects with advanced COPD or control lungs, and we validate findings using single-cell RNA sequencing of lungs from mice exposed to 10 months of cigarette smoke, RNA sequencing of isolated human alveolar epithelial cells, functional in vitro models, and in situ hybridization and immunostaining of human lung tissue samples. We identify a subpopulation of alveolar epithelial type II cells with transcriptional evidence for aberrant cellular metabolism and reduced cellular stress tolerance in COPD. Using transcriptomic network analyses, we predict capillary endothelial cells are inflamed in COPD, particularly through increased CXCL-motif chemokine signaling. Finally, we detect a high-metallothionein expressing macrophage subpopulation enriched in advanced COPD. Collectively, these findings highlight cell-specific mechanisms involved in the pathobiology of advanced COPD.

D 0 The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement D 0 A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly D 0 The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.

D 0
A description of all covariates tested D 0 A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons D 0 A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) □ l'xl For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted L.'.:.J Give P values as exact values whenever suitable. 0 D For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings 0 D For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes D 0 Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated Our web collection on statistics for biologists contains articles on many of the points above.

Software and code
Policy information about availability of computer code Data collection Data analysis mkfastq pipeline in Cell Ranger's (v3.0.2). cutadapt (v1.17 and v2.9), zUMIs pipeline (v2.0), TrimGalore! (v0.6.6), STAR (v2.6.0c, v2.7.3a, and v2.7.5c) Single-cell analysis was performed using the Seurat R package (v3.2.3 and v4.0.4) using the recommended workflow. CELLEX (v.1.2.1) CELLEX: (https:// github.com/perslab/CELLEX) was performed using recommended normalization method and preprocessing steps. We then ran CELLECT v. 1.1.0 with the recommended workflow (CELLECT-LDSC) and default parameters (100 kb window size around each gene) (https://github.com/perslab/ CELLECT). Connectome analysis was performed using R software Connectome (vl.0.0) https://msraredon.github.io/Connectome/). Average expression values for every gene within cell types were calculated and mapped against the FANTOMS database of known ligand-receptor pairs to create a global connectome. For the centrality figure, the code we used was very similar to the CompareCentrality function in the Connectome (v l.0.0) package, but we changed the scaling method for visualization and added the Durbin significance test feature. The function for the network maps is not included in the Connectome package, so we added it. Finally, for the fold change calculations, the method was previously described , but the code was separate from the Connectome package. Modifications available to readers upon request. Other: Fiji v.1.0, GraphPad Prism v9.3.0, CellProfiler v4.2.1, featureCounts (v2.0.1), UpsetR(v.1.4) For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

April 2020
Data Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available dataset s -Alist of figures that have associated raw data -Adescription of any restrictions on data availability Plea se select the one below that is the best fit for your rese arch. If you are not sure, rea d the appropriate sections before making your selection.

Field-specific reporting
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf No sample size calculation was performed for human studies because we used available samples for analysis and sample size is restricted due to practical constraints related to availability of explanted lung tissue. We chose to use both male and female mice for murine studies but limited samples size to 2 mice/ gender/group due to cost of single-cell RNA seq analysis and limited resources.
All data exclusion occurred prior to data analysis. For human scRNAseq, one COPD sample was excluded due to a reported history of no cigarette smoke exposure. In order to age-match our control and COPD samples, we excluded control samples from individuals < 40 years of age. For human single-cell RNA sequencing , we removed cells with 12% of transcripts arising from unspliced RNA, cells with less than 1000 transcripts profiled or > 20% of their transcriptome of mitochondrial origin were then removed. For mice: We removed barcoded cells with <7.5% of transcripts arising from unspliced mRNA, cells with <1000 transcripts profiled, and cells with >5% of their transcriptome of mitochondrial origin. Background contamination from cell free mRNA was removed using SoupX software (v1.2.2). Finally, in Figure 6B and supplemental Figure 24, we excluded samples where the number of alveolar macrophages < 100.
The number of biological replicates in each experiment is described in the Methods section and figure legends. Key findings from paper were reproduced, either in other human cohorts and/or in murine studies. Key findings were also validated with immunofluorescence or in situ hybridization staining.
Mice were randomized to either receive cigarette smoke exposure or room air exposure. Human samples were not randomized due to patient recruitment and sample collection based on availability of donor lung tissue. For experiments other than those on mice and human samples, samples were randomly allocated into experimental groups.
For human samples, blinding was not appropriate as groups were assigned based on disease. Similar, the investigator could not be blinded as to whether the mice received CS exposure or not. However, computational analyses, such as clustering did not use experimental group to assign annotation. For immunofluorescence staining and in situ hybridization experiments, microscopists were blinded to whether the samples were from control or COPD patients. For other in vitro experiments, investigators could not be blinded to their own experimental groups.
The axis scales are clearly visible. Include numbers along axes only for bottom left plot of group (a 'group' is an analysis of identical markers).
All plots are contour plots with outliers or pseudocolor plots.
A numerical value for number of cells or percentage (with statistics) is provided.
Cell lines were not authenticated.
Cell lines were not tested for mycoplasma contamination No commonly misidentified lines were used.
We obtained Sftpc-CreERT2 (stock #028054) and Rosa26-mTmG C57Bl/6 (stock #007676) mice from Jackson Laboratories and bred them together to generate Sftpc-Cre/Rosa26-mTmG mice. 2 Male and 2 female 8-10-week-old mice. Mouse were maintained in local housing facility of a controlled condition (23±1°C, 50±10% humidity and 12-12h light-dark cycle). This study did not involve wild animals This study did not involve field-collected samples Animal protocols were approved by the Animal Care and Use Committee at Yale University Note that full information on the approval of the study protocol must also be provided in the manuscript.

Human research participants
Human scRNA seq data: Our analysis focused on 17 patients with advanced COPD, and 15 age-matched controls. Sample size for 10x Genomics scRNA-seq was determined by the availability of patient samples, as was the sample size for all other experiments. There were eight females in both groups and the median age of all subjects was 62 years old (range 41-80). All COPD subjects had radiographic evidence of advanced emphysema and were former smokers; four of the donors were either current or former smokers. Further details are highlighted in Supplemental Table 1. Using flow-sorted epithelial cells from single cell suspensions of 10 subjects with advanced COPD and 16 controls.
Human diseased and control explanted lungs were procured from donors with end-stage lung disease undergoing transplant or control lungs rejected for transplant, per protocols approved by the Partners IRB with informed consent. Findings should be generalizable to patients with advanced COPD requiring transplant but not to all COPD patients. Biases related to why control donor lungs were rejected from transplant may have impacted results.
Ethics oversight Study protocols were approved by Partners Healthcare Institutional Board Review (IRB Protocol 2011P002419). informed