Introduction

Macrophages are the most abundant immune cells in the lower airways of the human respiratory tract. They are involved in host immune defenses and airway homeostasis. They are also highly plastic, being able to change their phenotype and function depending on the local milieu1,2. Over the past few decades, a conceptual framework has evolved to describe the activation pattern of macrophages in vitro, which has been used to categorize macrophages into at least two distinct phenotypes: classically (M1) or alternatively (M2) activated macrophages3.

Briefly, classical activation of M1 macrophages is typically induced by lipopolysaccharide (LPS)/interferon (IFN)-γ or tumor necrosis factor (TNF) and contributes to a pro-inflammatory milieu, and demonstrate strong bactericidal activities, along with expression of CD40, CD80, CD86 and inducible nitric oxide synthase (iNOS). In contrast, alternatively activated M2 macrophages are stimulated by interleukin (IL)-4, and IL-13 and contribute to immunomodulation by scavenging debris, enabling tissue repair and promoting remodeling of local tissue. M2 macrophages are also characterized by a high expression level of scavenger receptors such as CD163 and CD2064,5,6. In general, M1 macrophages are thought to be pro-inflammatory, while the M2 macrophages limit inflammation and promote healing and homeostasis7. Although this simple classification works in vitro, it is of limited use in vivo. Owing to repeated exposures to external microbes and environmental toxins, the microenvironment in the lung can rapidly change, leading to modulations in the host immune response. Consistent with this notion, one study showed that chronic inflammation can lead to broad changes in the transcriptional repertoire of macrophages beyond the classical M1 and M2 phenotypes8. As there is a scarcity of data on the state of polarization of alveolar macrophages in vivo, the utility and appropriateness of the classical M1/M2 categorization of macrophages in the human airways are uncertain. We hypothesized that while in health most macrophages can be categorized based on M1/M2 polarization, in inflammatory lung conditions such as chronic obstructive pulmonary disease (COPD), most cells will be non-typeable. To address this hypothesis, here, we phenotyped macrophages in human bronchoalveolar lavage (BAL) fluid by using classical cell surface markers: CD40 for M1 macrophages and CD163 for M2 macrophages5,9,10,11. This study was exploratory in nature and its main purpose was to characterize the polarization and associated gene signatures of macrophages from human BAL fluid in health and disease according to their classical cell surface markers for M1/M2 phenotypes.

Results

Macrophage sub-phenotyping

We first performed flow cytometry on macrophages that were isolated in BAL fluid of 25 participants including 8 chronic obstrutive pulmonary disease (COPD) and 13 asthma patients and divided the cell population into four groups: double negative (DN), double positive (DP), and M1 and M2 based on cell surface markers. The mean percentage of DN, DP, M1 and M2 subtypes in the BAL fluid was 24.8%, 35.4%, 14.0% and 25.9%, respectively.

RNA-sequencing

Next, we performed RNA-sequencing on BAL macrophages collected from 10 consecutive participants between March 2019 and June 2019. After exclusion of 3 samples (i.e. 1 DN and 2 DP’s, which were obtained from three patients with asthma) because of poor RNA quality or low RNA yield, we subjected the remaining samples (n = 37) to bulk-RNA sequencing. The demographic data of the study participants are summarized in Table S1.

Differentially expressed genes (DEGs) between subtypes

We compared the transcriptomic expression pattern across all four subtypes of macrophages. The greatest number of differentially expressed genes was observed with the DN subtype (1886 differentially expressed genes versus all other subtypes at a 10% false discovery rate, FDR). There were 498 differentially expressed genes between the DP subtype and the others; 15 genes between the M1 subtype and the others; and 52 genes between the M2 subtype and the others (Fig. 1). All differentially expressed genes at 10% FDR are shown in Table S2.

Figure 1
figure 1

Volcano plots showing differentially expressed genes across macrophage subtypes (n = 10). (a) Double negative (DN) subtype versus the other subtypes. (b) Double positive (DP) subtype versus the others. (c) M1 subtype versus the others. (d) M2 versus the others. The plot shows the fold-change on the X-axis versus the unadjusted p values (on a –log10 scale) on the Y-axis. Differentially expressed genes at 10% FDR are represented as colored dots and the top 20 up-regulated genes for each cell-type are labelled on the graph. The greatest number of differentially expressed genes was observed with the DN subtype (1886 differentially expressed genes versus all other subtypes at 10% FDR) followed by 498 differentially expressed genes between the DP subtype and the others; 15 genes between the M1 subtype and the others; and 52 genes between the M2 subtype and the others. The top 20 up-regulated genes for the DN macrophages included 15 mitochondrial genes and 4 mitochondrial pseudogenes. Figure created with the R (version 3.5.0). https://www.r-project.org/.

Among the differentially expressed genes, 602 were up-regulated in the DN subtype, 292 were up-regulated in DP, 12 were up-regulated in M1 and 11 were up-regulated genes in the M2 subtype. The top 20 up-regulated genes for each subtype are shown in Fig. 1. Importantly, the top 20 up-regulated genes for the DN macrophages included 15 mitochondrial genes and 4 mitochondrial pseudogenes; for other macrophage subtypes, these mitochondrial genes were not differentially expressed.

Enrichment analysis was performed using only the up-regulated genes for each macrophage subtype at 10% FDR. Up-regulated genes in DN were enriched in 86 GO biological processes. The results of the enrichment analysis including the top 5 strongest p-values are shown in Fig. 2. These included genes involved in inflammatory responses (FDR = 3.89E−04).

Figure 2
figure 2

Enrichment analysis: The top 5 Gene Ontology (GO) biological processes based on up-regulated differentially expressed genes. The top 5 GO biological processes at 10% FDR for each macrophage subtype are shown. The circle size represents the number of overlapping genes between each GO process and up-regulated genes according to the subtype. The colour scale represents the extent to which the up-regulated genes are significantly enriched in each GO process. The GO processes associated with DN, DP and M1 subtype included inflammatory response, complement activation and response to virus, respectively, whereas none of the up-regulated genes for M2 macrophages were enriched in the GO processes. Figure created with the R package ggplot2. https://cran.r-project.org/web/packages/ggplot2/index.html.

For the DP subtype, up-regulated genes were enriched in 61 GO processes, which included pathways for complement activation (FDR = 1.05E−05), protein activation cascade (FDR = 1.05E−05), antigen processing and presentation of peptide antigen (FDR = 3.73E−05). Up-regulated genes in M1 were enriched in 21 GO processes such as responses to viruses (FDR = 3.79E−02), IFN-γ (FDR = 7.32E−02) and LPS (FDR = 7.70E−02). However, none of the up-regulated genes for M2 macrophages were enriched in the GO processes. All enriched GO biological processes at 10% FDR are shown in Table S3.

Likewise, we compared the transcriptomic expression pattern on the basis of CD163 positivity. Among 128 differentially expressed genes at 10% FDR, 39 genes were up-regulated and 89 genes were down-regulated in macrophages positive for CD163 (Fig. S1). All differentially expressed genes at 10% FDR are shown in Table S4. However, none of the up- or down-regulated genes were enriched in GO biological processes at 10% FDR.

Weighted gene co-expression network analysis (WGCNA)

Based on detectable expression levels of all genes on the RNAseq platform (18,314 genes in total), 13 gene expression modules were constructed using a weighted gene co-expression network analysis (WGCNA). The WCGNA-derived modules ranged in size from 19 genes in module 13 to 1,831 genes in module 1. The 13 modules as well as the “garbage module” (i.e. module 00) derived from WGCNA are shown in Table S5.

Among these module eigengenes, nine of them were differentially expressed in at least one macrophage subtype at a 10% FDR threshold (Fig. 3). Gene signatures in the DN subtype were significantly distinct from those of other subtypes. In contrast, M1 macrophages were not associated with a distinct module. Importantly, module 12, which was up-regulated in DNs, was down-regulated in the DP and M2 subtypes. This module was composed of 15 mitochondrial genes, 13 protein subunit genes, 2 ribosomal RNA and 4 mitochondrial pseudogenes. The top genes with the highest membership in this module were MT-ATP6, MT-CYB and MT-ND4.

Figure 3
figure 3

Heat map of the correlation of weighted gene co-expression network analysis (WCGNA) modules with macrophage subtypes (n = 10). The rows represent the gene modules and the sizes of the modules are shown in parentheses next to the module name. The columns represent macrophage subtypes. In each cell, the number at the top is the linear regression coefficient and the number in the parentheses is the corresponding p-value. Color scale represents the regression coefficient. Only modules with at least one significant cell at 10% FDR across four subtypes are  shown. Gene signatures in the DN subtype were significantly distinct from those of the othersubtypes . Importantly, Module 12, which was composed of 15 mitochondrial genes, was up-regulated in DNs, and was down-regulated in the DP and M2 subtypes. Figure created with the R package “WGCNA” (version 1.68). https://cran.r-project.org/web/packages/WGCNA/index.html.

Enrichment analysis was performed at 10% FDR to identify GO biological processes that were associated with these modules. The top three GO processes are shown in Fig. 4. Module 2 was associated with the function of ATP production in mitochondria including oxidative phosphorylation (FDR = 6.32E−44) and respiratory electron transport chain (FDR = 2.62E−31). Module 3 was associated with fatty oxidation and metabolism (FDR = 1.49E−03). Module 4 was associated with RNA splicing (FDR = 4.88E−08) and mitotic nuclear division (FDR = 4.88E−08). Module 6 was associated with regulation of protein localization to telomeres (FDR = 2.81E−04). Mitochondrial genes in module 12 were not included in the enrichment analysis because of the absence of a reference gene set for these genes. All enriched GO biological processes at 10% FDR and the top 20 genes with the highest membership for each of the modules are shown in Tables S6 and S7, respectively.

Figure 4
figure 4

The top 3 GO biological processes at 10% FDR based on a weighted gene co-expression network analysis (WGCNA). The top 3 GO biological processes at 10% FDR for each module are shown. The circle size represents the number of overlapping genes between each GO process and genes in the module. The colour scale represents the extent to which the genes are significantly enriched in each module. Module 2 was most strongly associated with the function of ATP production in mitochondria. Module 3 was  associated with fatty oxidation and metabolism. Module 4 was associated with RNA splicing and mitotic nuclear division. Module 6 was associated with regulation of protein localization to telomeres. Figure created with the R package ggplot2. https://cran.r-project.org/web/packages/ggplot2/index.html.

To further explore potential changes in mitochondrial respiration in the DN subtype, we examined the relative gene expression in the GO biological process of oxidative phosphorylation, which overlapped with those in module 2, and the mitochondrial genes in module 12. The oxidative phosphorylation genes were down-regulated in the DN subtype (Fig. 5a), whereas the mitochondrial genes were up-regulated in the DN subtype (Fig. 5b).

Figure 5
figure 5

Relative gene expression across macrophage subtypes (n = 10). Heat maps showing (a) genes in the gene ontology (GO) biological process of oxidative phosphorylation overlapping with genes in module 2 and (b) mitochondrial genes in module 12. Color scale represents the scaled mean expression level (log2 TPM) of each subtype (red indicates up-regulation, blue indicates down-regulation). The oxidative phosphorylation genes were down-regulated in the DN subtype, whereas the mitochondrial genes were up-regulated in the DN subtype. Figure created with the R package “NMF” (version 0.22.0). https://cran.r-project.org/web/packages/NMF/index.html.

Macrophage sub-phenotype distributions across diseases

Among 25 participants including 8 with COPD and 13 with asthma, we investigated the distribution of macrophage subtypes based on disease. Because the presence of asthma did not significantly alter the macrophage subtype distribution (Fig. S2), we compared the subtypes between participants with and without COPD. Patients with COPD were more likely to be males (p = 0.003) and current smokers (p = 0.001) and had lower FEV1/FVC (p = 0.006). None of the patients with COPD had been previously diagnosed with asthma. A majority of patients with COPD had moderate to severe airflow limitation. Baseline characteristics are shown in Table 1.

Table 1 The demographic data of patients with or without chronic obstructive lung disease (COPD).

Macrophages in patients with COPD were less likely to express classical surface markers including CD 163 compared to those without COPD (39.3% vs. 71.6%, p = 0.004). For the four macrophage sub-phenotypes, the DN subtype was enriched in patients with COPD (46.7% vs. 14.5%, p < 0.001). In contrast, the DP subtype was significantly reduced in COPD (16.5% vs. 44.3%, p = 0.001). There were no significant differences in terms of M1 and M2 subtypes (Table 1).

Discussion

Here, we showed that many macrophages in human BAL did not conform to the M1/M2 paradigm, and that these cells, especially those that did not harbor M1 or M2 cell surface markers, demonstrated distinct transcriptomic signatures. Interestingly, the double negative subtype contained differential expression of mitochondrial genes, which were significantly enriched in patients with COPD. Although previous studies have attempted to characterize the gene signatures in the framework of dichotomized classification in vitro, or in vivo, to the best of our knowledge, this is the first study to shed light on the transcriptional profile of macrophages beyond the M1/M2 classification in human BAL fluid12,13,14.

To date, the prevalence of M1 and M2 macrophages in vivo has not been well characterized. A priori we decided to use CD40 and CD163 based on previous studies, which showed that they were distinct markers for human M1 and M2 macrophages9,10,11. A number of studies have shown that CD40 and CD163 are expressed on 4–20% and 45–60%, respectively, of human lung macrophages in non-smoking individuals9,15,16. We extend these findings by showing that approximately 25% of all macrophages in human BAL fluid could not be phenotyped with CD40 and CD163 (and thus were deemed “double negative” macrophages); and that the percentage of these double negative cells increased significantly in BAL fluid of patients with COPD. This observation is consistent with previous studies which reported reduced expression of CD163 and CD40 on alveolar macrophages in COPD lungs9,17.

We also showed that these double negative cells demonstrate a distinct RNA signature compared with M1, M2 or double positive macrophages. The up-regulated, differentially expressed genes in double negatives were significantly enriched in inflammatory responses. The GO inflammatory responses were relatively broad and contained 43 genes including NFKB1 (NFκB1), NFKB2 (NFκB2), NFKBIA (IκBα), TNFRSF1B (TNFR2), CXCL1, CXCL2, CXCL8, IL1B (IL-1β) and NLRP3. The transcription factors NF-κB and TNF orchestrate many immune and inflammatory responses including stress responses and regulation of cell proliferation and apoptosis18,19,20. CXCL1, CXCL2, CXCL8 and IL-1β are powerful pro-inflammatory cytokines released by macrophages for neutrophil recruitment21,22,23. NLRP3, on the other hand, has been linked with age-related cellular dysfunction, or “inflammaging”, which is characterized by dysregulated low-grade inflammation24.

In WGCNA, interestingly, we found that module 2, which was associated with oxidative phosphorylation in mitochondria, was down-regulated in double negative macrophages. Mitochondria is the main producer of cellular energy by means of oxidative phosphorylation, which involves electron-transferring respiratory chain (complexes I–IV) and adenosine triphosphate (ATP) synthase (complex V). Although mitochondria has its own DNA, which contains 37 genes, over 98% of the mitochondrial proteins are encoded by the nuclear genome25,26. We found that a number of nuclear genes, which control cellular respiration and ATP synthesis in the mitochondria, were markedly down-regulated in double negative macrophages. For instance, 32 genes encoding the NADH: ubiquinone oxidoreductase supernumerary subunits (NDUF), which form an essential component of complex I of the respiratory chain, were depleted in double negative macrophages compared to the other subtypes, suggesting a reduced ability to generate ATP on demand to maintain cellular functions27. On the other hand, we observed a significant up-regulation of 15 mitochondrial genes in double negative macrophages in module 12.

The origins of double negative macrophages are unknown. Previous studies have shown that aged cells have reduced expression of surface markers including MHC class II molecules and co-stimulatory receptors such as CD40, which raises the possibility that the double negative macrophages may be senescent cells28,29. This notion is further supported by our data showing that these cells harbor differentially expressed genes involved in inflammation and mitochondrial (dys)function30. In the present study, we could not distinguish tissue resident macrophages from monocyte-derived macrophages31. It is noteworthy, however, MARCO was down-regulated in double negative macrophages (logFC = − 0.35, FDR = 0.021) compared to the other subtypes. MARCO is a marker of embryonically-derived resident macrophages, which raises the possibility that many of the double negatives originated from monocyte-derived macrophages32. Specific studies focused on cell lineages will be needed to pinpoint the exact source of these and other macrophage subtypes.

There were several limitations in this study. First, some of the up-regulated genes in M1 or M2 macrophages in our study did not fully align with classical patterns associated with M1- or M2-related genes. For instance, up-regulated genes in M1 macrophages included those encoding for proteins such as CCL22 and MMP12, which have been related to M2 macrophages33,34. Also, genes encoding CD40 and CD163 did not co-express with genes encoding other surface markers commonly used to characterize M1 or M2 macrophages such as CD80 or CD206, respectively5. However, these markers were identified using in vitro generated macrophages where M1-related markers were induced by LPS/IFN-γ and M2-related markers were induced by IL-4/IL-13. A comparative study of in vivo and in vitro macrophages showed that the gene signature of M1 macrophages activated by LPS in vivo shared many features of alternatively activated M2 genes in vitro including up-regulation of CCL2235. While not all M1 and M2 markers in vitro can be directly translated to the in vivo situation35, further studies are needed to validate our findings with other M1 and M2 markers  using technologies such as flow cytometry and single cell sequencing. Second, although the unique gene expression signature of double negative macrophages was interesting, we did not conduct functional studies to validate the potential mitochondrial dysfunction in these macrophages, which were suggested by the RNA sequencing data. Third, it is possible that some of our samples may have contained dendritic cells as they have similar morphology and share common surface markers such as HLA-DR and CD40 as macrophages36. Although dendritic cells generally constitute only 0.5% of total cells in BAL fluid, future studies should gate out these cells for more precise assessment of alveolar macrophages37,38.

In conclusion, approximately one out of four macrophages in human BAL fluid stain negatively for both M1/M2 cell surface markers. These double negative macrophages harbor gene expression signature that is pro-inflammatory and suggests dysfunction in cellular metabolism and homeostasis. These cells increase by threefold in the BAL fluid of COPD patients. Together, these data suggest that phenotypic shifts in alveolar macrophages may play a significant role in the pathogenesis and disease manifestations of inflammatory disease conditions such as COPD.

Methods

Study population

Following informed consent, we performed bronchoscopy and collected bronchoalveolar lavage (BAL) fluid in 25 patients including those with COPD, who participated in a clinical trial: Differential Effects of Inhaled Symbicort and Advair on Lung Microbiota (DISARM) (ClinicalTrials.gov identifier: NCT02833480) or those who underwent clinical bronchoscopy because of a pulmonary nodule between November 2017 and June 2019 at St. Paul’s Hospital (SPH) in Vancouver, Canada. These studies were approved by the University of British Columbia Clinical Research Ethics Board (certificate numbers: H14-02277 and H15-02166) and were conducted in accordance with the principles of the Declaration of Helsinki. COPD was defined based on the Global Initiative for Chronic Obstructive Lung Disease (GOLD) recommendations: symptoms of cough, dyspnea or sputum production, ≥ 10 pack-years of smoking history and a post-bronchodilator FEV1/FVC < 70% of predicted39. Asthma was defined based on the Global Initiative for Asthma (GINA) guidelines: symptoms of wheeze, cough or shortness of breath and airflow limitation that varied over time in individuals with a smoking history of less than 5 pack-years40. Patients were clinically stable and those who experienced an exacerbation of their COPD or asthma within at least four weeks of bronchoscopy were excluded.

Bronchoscopy procedure

All procedures were performed by an experienced pulmonologist. Conscious sedation was first provided to the patient with the use of intravenous midazolam and fentanyl. Through the working channel of the bronchoscope (Olympus Corporation, Tokyo, Japan), topical 2% lidocaine was instilled, if needed, to prevent bronchospasm and cough. BAL was generally taken from the right middle lobe or lingula unless these lobes had disease noted on a pre-operative computed tomography (CT) imaging (e.g. pulmonary nodule). For these cases, the right or left upper lobe was used. After the bronchoscope was fully wedged into the desired segment, 20 ml of 0.9% saline was instilled (with a dwell time of 10 s) and then the fluid was manually aspirated out using a vacuum syringe. The first aliquot of the recovered solution was discarded because of concerns over contamination by bronchial lining fluid or mucus secretions. Aliquots of 40 ml of saline were then sequentially instilled to a maximum volume of 200 mL or until 30-50 mL of BAL fluid was recovered, whichever came first.

Sample preparation

The recovered BAL fluid was put in ice immediately after aspiration and filtered through sterile 70 µm DNase/RNase-free cell strainers to remove large clumps and debris. Cells were recovered by centrifugation at 500 g for 10 min at 4 °C and were washed twice with PBS. The concentration of viable cells was determined by trypan blue staining on a hemocytometer. These samples generally contained 5 × 105 to 5 × 106 viable cells in 100uL of solution. These samples were then blocked with 10% human serum for 20 min and then stained with the following monoclonal antibodies against surface receptors (Biolegend, San Diego, US): anti-human HLA-DR antibody APC/Cy7 (Cat# 307,618, RRID: AB_493586), anti-human CD40 antibody Brilliant Violet 421 (Cat# 334,332, RRID: AB_2564211) and anti-human CD 163 Alexa Fluor 647 (Cat# 333,620, RRID: AB_2563475). The isotype control and compensation control for anti-human HLA-DR antibody APC/Cy7 were prepared in 2 separate tubes. Cells were incubated with the antibodies for 30 min, re-suspended and then incubated for an additional 30 min. At the end of the incubation period, the samples were washed in PBS and re-suspended in 600uL of PBS. Prepared samples were then analyzed using flow cytometry.

Flow cytometry and cell sorting

Flow cytometry was performed on a MoFlo Astrios-EQ cell sorter (Beckman Coulter, Brea, US). Single macrophage was gated by the forward and side scatter and the presence (or absence) of HLA-DR. Subsequently, the macrophage population was examined with CD40 (a M1 marker) or CD163 (a M2 marker). We decided to use respective isotype controls for dividing the CD40/CD163 plot because their colors were separated by both the laser line and the emission spectra, while we used an anti-human HLA-DR antibody APC-Cy7 single stain to adjust for spillover into anti-human CD163 Alexa Fluor 647. Positive expression of each marker was determined at > 2% compared to its isotype control. This procedure resulted in cells being grouped into 4 categories: double negative (DN)—CD40−/CD163−; double positive (DP)—CD40+/CD163+; M1—CD40+/CD163−; and M2—CD40−/CD163+. All flow cytometry data were analyzed using the Kaluza analysis software (Beckman Coulter, Brea, US). Representative flow cytometry panels are shown in Fig. S3. Sorted cells were collected directly into a lysis buffer (350uL of Buffer RLT Plus). The Buffer RLT Plus was supplemented with 1% 2-mercaptoethanol as suggested by the manufacturer of the Allprep DNA/RNA Mini kit (Qiagen, Hilden, Germany). All samples were thoroughly homogenized by vortexing for 90 s in the presence of the lysis buffer and then stored in -80C freezer and thawed once for RNA extraction.

RNA preparation for Illumina Next Seq

Total RNA was extracted using the Allprep DNA/RNA Mini kit. Sample quality control and sequencing was performed at the Biomedical Research Centre in University of British Columbia. Samples were evaluated for quality using Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, US) and those that passed the quality test were then prepped for sequencing using a standard protocol for the NEBnext Ultra ii Stranded mRNA (New England Biolabs, Ipswich, US). Sequencing was performed on Illumina NextSeq 500 with Paired End 42 bp × 42 bp reads.

Statistical analysis and RNA sequencing data processing

All statistical analyses were performed in R. For patient characteristics, continuous data including the distribution of macrophage subtypes were represented as mean ± standard deviation (SD) and categorical data as numbers (%) of observations. Statistical significance between COPD and non-COPD patients was assessed using a student’s t-test for continuous variables, and a Fisher’s exact test for categorical variables. P values < 0.05 were considered significant.

In RNA-seq data processing, raw sequencing reads were quality controlled using FastQC41. STAR (Spliced Transcripts Alignment to a Reference) was used to align the reads to GENCODE GRCh37 (version 31) genome reference and RSEM (RNA-Seq by Expectation Maximization) was used for quantification to obtain the counts and the transcript per million (TPM)42,43. The principal component analysis was used to check for potential batch effect and confounding factors. No obvious batch effect was observed but smoking status showed a potential confounding effect on the gene expression data (Fig. S4). Limma voom was used to normalize the count to log2 counts per million (CPM)44. Log2 CPM was used for the differential expression analysis while TPM, which normalizes for gene-length was used for the weighted gene co-expression network analysis (WGCNA). Genes with low abundance (log2 CPM < 1 or TPM < 5 in more than one-fourth of the samples) were filtered out.

For differential expression analysis, limma’s mixed effect model with adjustment of smoking status was used to compare one macrophage subtype versus the rest of the subtypes to identify characteristic gene expression signatures for each macrophage subtype. The Benjamini–Hochberg procedure was used to correct for multiple hypothesis testing and to control the false discovery rate (FDR); 10% FDR was used in line with previous studies which also have used 10% FDR45,46,47. A Gene Ontology (GO) enrichment analysis was performed on the significantly up-regulated genes at 10% FDR for each macrophage subtype using the R package, clusterProfiler48,49.

For the weighted gene co-expression network analysis (WGCNA), the R package WGCNA was used to construct gene modules with genes that were co-expressed with each other. The R code for WGCNA is available at https://github.com/yyolanda/macrophage_rnaseq. Given that gene length may have an effect on gene clustering, the gene-length normalized TPM was used for WGCNA48. We chose a soft-thresholding power (β) of 29 to obtain a scale free topology model fit index (R2) of > 0.8 (Fig. S5). The minimum module size was set to 15 genes and modules with a distance < 0.2 were merged together. A signed network with thirteen modules, excluding the “garbage” module which contained genes that did not co-express with any other genes, was constructed and each module was assigned a number. The “garbage” module was discarded and not included in the downstream analysis. The eigengene of each module was obtained by calculating the first principal component of the module and was considered as a representative of the expression profile of the module. Figure S6 shows a dendrogram of the clustering of the module eigengenes and a heatmap of the correlation between the module eigengenes. Limma’s mixed effect model was used to identify the module eigengenes that were associated with each macrophage subtype after adjusting for smoking status. For modules that were significantly associated with any of the macrophage subtypes at 10% FDR, a Gene Ontology (GO) enrichment analysis was performed using the R package clusterProfiler to identify biological processes that were associated with the genes of each module. To further investigate these modules, we ranked the module genes by their module membership. The top genes with larger and positive module membership were highly connected to the other genes in the module and exhibited the same direction of effect as the module eigengene. As a sensitivity analysis, we performed the analyses using a 5% FDR; these results are shown in Tables S8, S9, S10 and S11.