Introduction

Sarcoidosis is a multisystem immune-mediated disease of unknown cause with widely variable disease manifestations, severity, and outcomes1. It affects 45–300/100,000 individuals in the US, all ages, races, and both sexes2,3. Diagnostic delays are frequent as sarcoidosis is a diagnosis of exclusion, with no confirmatory test currently available. Despite a greater understanding of sarcoidosis pathogenesis4,5, the mechanisms contributing to the heterogeneity of disease manifestations and predictors of disease outcomes are poorly defined6. The annual mortality is approximately 2.8/million people1 and rising. Sarcoidosis-related mortality is attributed to four high-risk manifestations that include: treatment-resistant pulmonary sarcoidosis, multi-organ sarcoidosis, cardiac sarcoidosis, and neurosarcoidosis7. Respiratory failure from progressive pulmonary disease is the leading cause of sarcoidosis-related mortality in the US7,8. While remission is common, it is not known if systemic anti-inflammatory therapy decreases the risk of progressive pulmonary disease. Another current knowledge gap is the absence of validated markers to predict which patients with pulmonary sarcoidosis will progress.

The pathologic hallmark of sarcoidosis is the formation of epithelioid granuloma associated with infiltration of CD4 + T cells and scattered macrophages, giant cells, with CD8 + T cells and B cells around the granuloma9. While the exact details are not known, it appears that exposure to a yet unidentified antigen(s) results in an exuberant adaptive immune response with CD4 + T cells10, regulatory T cells (Tregs), and high levels of Th1 cytokines TNF–α, IFNγ, and IL–2. Additionally, an abnormal innate immune response is seen in bronchoalveolar lavage (BAL) cells in sarcoidosis. A less robust immune response is apparent in remitting disease compared to the exuberant response in progressive sarcoidosis, likely due to different T cell populations and abnormal counter-regulatory immune measures. Overall, the immune response is aberrant in sarcoidosis and compartmentalized to the lung with much higher response noted in the lung cells compared to blood cells11,12. The whole blood transcriptional profile of active sarcoidosis overlaps with tuberculosis and chronic beryllium disease, and inactive sarcoidosis overlaps with controls13,14. Genes with differential expression in sarcoidosis map to IFN-signaling, TLR signaling, and Fcɣ receptor-mediated phagocytosis15,16. In chronic progressive sarcoidosis, the gene expression in peripheral blood mononuclear cells demonstrates differential expression of genes participating in CXCL9 and TCR-mediated responses17. Transcriptional studies in BAL cells revealed that pathways linked to adaptive immune response, T-cell signaling, and chemokine signaling such as IFNγ, IL-12, 1L-17, and IL-23 are involved in sarcoidosis18. In lung tissue, gene networks engaged in cell movement, immune function, and in Th1-type responses such as signal transducer and activator of transcription 1 (STAT1), IL-5, IL-7, CXCR5, and CXCR9 were overexpressed in sarcoidosis lung tissues11. However, the approach of examining comprehensive protein changes that result from these differences in transcription is underutilized and has not been well evaluated using contemporary techniques.

Prior studies have used protein microarrays19,20, 2-dimensional electrophoresis (2DE)12,21,22,23, and top-down24 as well as shotgun proteomics25,26,27 to examine variable sarcoidosis phenotypes including Lofgren’s syndrome, non-Lofgren’s chest x-ray (CXR) stage I, and stage II/III pulmonary sarcoidosis and compared them to subjects with asthma, IPF, tuberculosis or healthy smoking and non-smoking controls. These studies have identified differences in protein spots on 2DE12,21,22, differentially expressed proteins25,26,28 and also possible mechanisms that could explain the development of sarcoidosis25,26,27. In a large study that utilized SELDI-TOF MS to compare BAL fluid (BALF) from sarcoidosis subjects with Lofgren’s syndrome and different CXR stages of pulmonary sarcoidosis (n = 65) with healthy controls, 40 differentially expressed peaks were identified compared to healthy controls and included 27 peaks that were specific for a particular CXR stage24. A study using affinity planar antigen microarray proteomics examining BALF and reported that mitochondrial ribosomal protein L43, nuclear receptor coactivator 2, adenosine diphosphate-ribosylation factor GTPase activating protein 1 and zinc finger protein 688 demonstrated higher reactivity in sarcoidosis lungs20. Another study reported several differentially expressed BALF proteins in nine sarcoidosis patients with stage II/III sarcoidosis compared to healthy controls analyzed by 2DE followed by MALDI-TOF MS25. The differentially expressed proteins mapped to canonical PI3K/Akt/mTOR signaling, MAP kinase, hypoxia response, and pluripotency-associated transactional factor pathways. These studies support rigorous evaluation of well-characterized, clinically-meaningful sarcoidosis phenotypes by contemporary techniques to identify novel mechanisms of sarcoidosis which can provide tenable treatment targets and biomarkers for personalized care.

Our goal is to couple contemporary proteomics with data-driven analytics for unbiased discovery of novel disease mechanisms in pulmonary sarcoidosis and progressive pulmonary disease, a known high-risk manifestation of sarcoidosis. As a critical first step in evaluating the proteome in sarcoidosis, we focus on BAL cells as alveolitis is seen in patients with active pulmonary sarcoidosis and immune cells provide an ex vivo model for biological mechanisms in inflammatory lung diseases. The BALF is the most proximate fluid to the site of injury, and thus has a high likelihood to identify disease-specific and potentially pathogenic changes. For this proof-of-concept study, we performed label-based MS for measuring protein abundance to gain insights into the intracellular protein interactions in sarcoidosis. We also employed label-free quantitative proteomics on BALF from controls and untreated sarcoidosis cases who, on follow-up, either were found to have progressive or non-progressive pulmonary disease. We found significant differences in BALF and cellular proteins between cases and controls and progressive versus non-progressive cases suggesting that this approach may find useful application in larger studies.

Results

We characterized the proteins in BAL cells from four controls and four sarcoidosis cases. There was no difference in age, sex, race and smoking status for the two groups. The BAL leucocyte count was not significantly different but the sarcoidosis cases had more lymphocytes and a lower number of macrophages (Table 1). For the studies in BALF, we examined seven controls and ten sarcoidosis subjects (non-progressive = 5, progressive = 5) prior to initiation of any systemic anti-inflammatory therapy. There was no difference in the age, race, smoking status, BAL leucocytes, neutrophils, or lymphocytes and macrophages (Table 2). At enrollment, the forced vital capacity (FVC), forced expiratory volume in 1 second (FEV1) and diffusing capacity for carbon monoxide (DLCO) were also not different in subjects with progressive vs. non-progressive disease.

Table 1 Clinical and demographic variables for controls and sarcoidosis subjects for BAL cell studies.
Table 2 Clinical and demographic variables for controls and sarcoidosis subjects for BALF studies.

Cellular proteins differ between sarcoidosis BAL cells and controls

The liquid chromatography (LC)-tandem mass spectrometry (MS/MS) identified 23,837 spectra at the given thresholds; 16,890 (71%) were included in quantitation. From these spectra, we identified 4,365 proteins (Supplemental Table S1; ‘Scaffold export’ tab). These included three proteins from the common Repository of Adventitious Proteins (cRAP) (serum albumin precursor, cluster of trypsin precursor and keratin, type 1 cytoskeletal 9) and 56 proteins that matched to the decoy (reverse) sequences, which were removed from further analysis resulting in identification of 4,306 high-confidence proteins (probability of 99%, Supplemental Table S2; ‘Scaffold-cleaned up’ tab).

We used a stringent permutation testing and identified 272 differentially expressed proteins controlling for an FDR of ≤ 5%, Fig. 1 (Supplemental Table S1; ‘DE Proteins’ tab) between cases and controls. Table 3 lists the differentially expressed proteins that showed the most significant changes. Several other proteins that were differentially expressed included myeloperoxidase, T-cell immune regulator, cathepsin G, integrin subunit beta2, integrin subunit alpha M, myosin light chain, matrix metalloproteinase 9, PI3K regulator subunit, APOE, interleukin-13 receptor alpha 1-binding protein (TRAF3-interacting protein 1), and SERPINA1.

Figure 1
figure 1

Volcano plot showing the differentially expressed BAL cell proteins. An individual dot represents each protein. The log2 fold change is plotted on the x-axis, and the log2 FDR corrected p-value is plotted on the y-axis. The horizontal dashed line corresponds to statistical significance from the permutation test (B and H corrected p-value = 0.0025) on a numerical scale, and the vertical line corresponds to a 1.2-fold change. The protein depicted by red dots are more abundant in sarcoidosis and the ones in blue dots are more abundant in controls. The black dots indicate the proteins that do not show a statistically significant change. MUC5A Mucin 5A, FCGBP IgG Fc-binding protein,  MIPT3 TRAF3-interacting protein (also called Interleukin-13 receptor alpha 1-binding protein), PDCD4 Programmed cell death protein 4, P85A Phosphatidylinositol 3-kinase regulatory subunit alpha, ITB2 Integrin beta-2, E9PMC5 T cell immune regulator 1, ANXA3 Annexin A3, CD163 Scavenger receptor cysteine-rich type 1 protein, CD177 CD177 antigen, PERM Myeloperoxidase.

Table 3 Top ten differentially expressed cellular proteins comparing controls with sarcoidosis.

Biological relevance of the differentially expressed proteins in the BAL cells of cases compared to controls

To determine the biological significance of the differentially expressed proteins, we performed IPA core analysis to identify the canonical pathways that map to these proteins. The pathways that met the statistical threshold (− log[p-value] ≥ 1.3) and the proteins assigned to each canonical pathway are listed in Table 4. These include phagosome maturation, leukocyte extravasation signaling, tight junction signaling, ILK signaling, IL-8 signaling, clathrin-mediated endocytosis signaling, caveolin-mediated endocytosis signaling, glucocorticoid receptor signaling, NRF2-mediated oxidative stress response and RhoA signaling (Fig. 2). We also identified pathways linked to matrix turnover and glucocorticoid receptor signaling. Several metabolic pathways such as fatty acid β-oxidation, mitochondrial dysfunction, ethanol degradation, tryptophan metabolism and NRF2-mediated oxidant response also differed between controls and sarcoidosis subjects. The z-score indicating the activation state was available for fatty acid β-oxidation (− 2.5), leukocyte extravasation signaling (− 0.6), coagulation system (− 0.5), inhibition of matrix metalloproteases (1.0), ILK signaling (− 0.4), ethanol degradation (− 1.0), IL-8 signaling (− 1.7) and acute phase response signaling (− 1.9).

Table 4 Canonical  pathways represented by cellular proteins differentially expressed between sarcoidosis and control subjects.
Figure 2
figure 2

Cellular canonical pathways represented by differentially expressed proteins between sarcoidosis and controls implementing Overlapping Canoncial Pathway functionality in IPA. The 273 differentially expressed proteins map to thirty statistically significant canonical pathways. Each canonical pathway is represented as a node. The edges indicated at least two common proteins between the nodes to indicate shared biological function. Three clusters of overlapping pathways were identified. A larger cluster of overlapping canonical pathways includes diverse biological functions including IL-8, ILK, RhoA signaling, caveolin and clathrin-mediated endocytic signaling, NRF2-mediated oxidant response signaling and glucocorticoid receptor signaling (Panel A). The other two of have limited number of nodes and are involved in metabolic functions (Panels B, C).

Differences in the bronchoalveolar lavage fluid proteins between sarcoidosis and controls and between sarcoidosis phenotypes

We examined BALF from seven control and ten sarcoidosis subjects. All BALF samples were analyzed by label-free mass spectrometry in triplicates. We identified 1,293 BALF proteins at an FDR of ≤ 1% (Supplemental Table S2; ‘Original File’ tab). These included 62 proteins that matched to the decoy (reverse) sequences or cRAP database such as keratins, filaggrin, cartilage matrix proteins, which were not considered for further analysis. The remaining 1,231 included 1,195 proteins present in all patients and controls. Seven proteins were only detected in controls and not in sarcoidosis cases, while five proteins were present only in sarcoidosis cases but not in control BALF. There were 12 proteins detected in controls and non-progressive cases but not in progressive sarcoidosis, five proteins in control and progressive cases but not in non-progressive sarcoidosis, one protein in only non-progressive but not in controls or progressive sarcoidosis, and four proteins were detected in only progressive but not in non-progressive sarcoidosis or controls (Fig. 3A). Peptides from the 1,231 BALF proteins (Supplemental Table S2; HAP CON REV tab) included proteins that originate from inflammatory cells and epithelial cell such as chitotriosidase-1, macrophage colony stimulating factor, Fc-gamma RIII-alpha, macrophage migration inhibitory factor (macrophage), human neutrophil defensin 3, neutrophil elastase (neutrophils), lymphocyte antigen, lymphocyte cytosolic protein (lymphocytes), aquaporin 1 and 5 (type 1 alveolar epithelial cells), and surfactant protein B (type 2 alveolar epithelial cells). Sixty-nine high abundance and immunoglobulin proteins or immunoglobulin fractions that were not completely removed by the high-abundance protein depletion column were also detected. These proteins were included for functional analysis as these proteins are crucial for many biological functions. Good quality quantitative spectral data was available to compare 1,223 of the 1,231 proteins in sarcoidosis vs. control subjects (Supplemental Table S3; ‘Sarc vs. control’ tab) and 1,206 of 1,231 proteins in progressive vs. non-progressive pulmonary sarcoidosis subjects (Supplemental Table S3; ‘P vs NP’ tab).

Figure 3
figure 3

The BALF proteins detected in the controls and sarcoidosis cases. (A) The spectral database search identified 1,231 proteins of which 1,195 were detected in control, progressive and non-progressive subjects. Seven proteins were identified in control subjects but not in sarcoidosis cases. Five protein were present in sarcoidosis cases but not in controls, and four* proteins were detected in progressive sarcoidosis cases. (B) Volcano plot showing the differentially expressed BALF proteins. An individual dot represents each protein. The log2 fold change is plotted on the x-axis, and the log2 FDR corrected p-value is plotted on the y-axis. The horizontal dashed line corresponds to a corrected p-value = 0.05 on a numerical scale, and the vertical line corresponds to a 1.2-fold change. The left panel compares sarcoidosis to controls, and the right panel examines progressive and non-progressive subjects. The proteins depicted by red dots are more abundant in sarcoidosis (left panel), or progressive sarcoidosis (right panel) and have a positive log fold change. The blue dots are more abundant in controls (left panel) or non-progressive sarcoidosis (right panel). The black dots indicate proteins that do not show a statistically significant change. CHIT1 Chitotriosidase, GSTM3 Glutathione-S-transferase, 1A68 HLA class I histocompatibility antigen, SFTPD Pulmonary surfactant-associated protein D, PDC61 Programmed cell death 6-interacting protein, PD1L2 Programmed cell death 1 ligand 2, HMGA1  High mobility group protein HMG-I, CYTS Cystatin-S, VCAM1 Vascular cell adhesion protein, E9PMV2 HLA class II histocompatibility antigen, DQ alpha 1 chain, ICAM1 Intercellular adhesion molecule 1, AXA81 Annexin A8, GATA5 Transcription factor GATA-5, MUC5B Mucin-5B. *One detected protein was an uncharacterized protein.

We identified 293 differentially expressed proteins in sarcoidosis (n = 10) compared to the seven control subjects (Supplemental Table S3; Sarcoidosis vs Control tab), Fig. 3B. These proteins included chitotriosidase-1, serum amyloid protein P, surfactant protein D, S100P, inter-alpha-trypsin inhibitor, annexin, glutathione-S-transferase, interleukin-1 receptor accessory protein, cystatin-5, caveolin, choline transport protein, Fc-gamma RII-a, (Fcγ-binding protein), interleukin 6 receptor, programmed cell death 1 ligand 2, and aquaporin-1. The proteins with most significant differences with a higher abundance in sarcoidosis or controls are listed in Table 5. To find the biological relevance of the differentially expressed proteins, we determined the canonical pathways that map to these proteins (Table 6). These pathways include phagosome formation and maturation, IL-8 signaling, IL-12 signaling in macrophages, clathrin and caveolin endocytic signaling, LXR/RXR activation, B cell receptor signaling, communication between innate and adaptive immune cells, aryl hydrocarbon receptor signaling and NRF2-mediated oxidative stress response. Kinases signaling pathways such as PTEN, phospholipase C and GP6 signaling also map to the differentially expressed proteins. Overlapping Canonical Pathway analysis identified highly intricate network of pathways participating in immunological functions, acute phase response and metabolic processes (Fig. 4). The z-score indicating the activation state was available for LXR/RXR activation (2.9), acute phase response signaling (1.39), complement system (− 0.8), coagulation system (− 0.816), agrin interactions at the neuromuscular junction (− 1.633), glutathione-mediate detoxification (1.3), osteoarthritis pathways (− 0.4), SPINK1 pancreatic cancer pathway (1.6), intrinsic prothrombin activation pathway (− 0.5), phospholipase C signaling (− 0.6), serotonin degradation (− 1.3), BAG2 Signaling Pathway (− 1.), neuroprotective role of THOP1 in Alzheimer’s disease (− 2.2), leucocyte extravasation signaling (− 1.4), IL-8 signaling (− 0.4), GP6 signaling Pathway (0.8), PTEN signaling (2.5) and integrin signaling (− 1.9).

Table 5 Top differentially expressed BALF proteins in sarcoidosis vs controls.
Table 6 Canonical pathways represented by proteins differentially expressed between sarcoidosis and control subjects in BALF.
Figure 4
figure 4

The canonical pathway represented by differentially expressed proteins in BALF between sarcoidosis and controls implementing Overlapping Canonical Pathway functionality in IPA. The 293 differentially expressed proteins map to 65 statistically significant canonical pathways. Each canonical pathway is represented as a node. The edges indicated at least two common proteins between the nodes to denote shared biological function. Complex network of pathways with diverse functions including immunological processes, signal transduction by kinases, acute phase response signaling, NRF2-mediated antioxidant response and several metabolic pathways were detected in this analysis.

When we compared the BALF proteins between progressive vs. non-progressive sarcoidosis subjects (n = 5 each), there were 121 differentially expressed proteins. The proteins that differed between phenotypes included heat shock protein 90, glutathione-S-transferase, mucin-5B, annexin, CD5 antigen like protein (apoptosis inhibitor expressed by macrophages), chitotriosidase 1, ICAM 1, tropomyosin, integrin beta-2, pulmonary surfactant protein B and D, fatty acid binding protein, and HLA class II histocompatibility antigen DQ-α. The proteins with most significant differences with a higher abundance in cases with progressive disease compared to non-progressive disease are listed in Table 7. To determine the pathways that may contribute to the progression of sarcoidosis, we mapped the differentially expressed proteins between the progressive and non-progressive cases to canonical pathways in IPA (Table 8); these include aryl hydrocarbon receptor signaling, clathrin-mediated endocytic signaling, glutathione redox reaction, glutathione-mediated detoxification, antigen presentation pathway, phagosome formation, CD28 signaling in T-helper cells, CDC-42 signaling, RhoA signaling and PFKFB4 signaling pathway (Fig. 5). The z-score indicating the activation state was available for glycolysis (1.0), LXR/RXR (− 1.6) and IL-8 signaling (1.3).

Table 7 Top differential expressed BALF proteins comparing progressive to non-progressive cases.
Table 8 Canonical pathways mapping to proteins differentially expressed between progressive and non-progressive sarcoidosis.
Figure 5
figure 5

The canonical pathway represented by differentially expressed proteins in the BALF between progressive and non-progressive sarcoidosis implementing Overlapping Canonical Pathway functionality in IPA. The 121 differentially expressed proteins map to twenty-seven canonical pathways. Each canonical pathway is represented as a node. The edges indicated at least two proteins between the nodes to indicate shared biological function. The Th1 pathway, CD28 signaling, CDC-42 signaling and IL-8 signaling are highly-connected nodes detected with this analysis.

Discussion

Use of ‘omics’ tools to improve the understanding of sarcoidosis has been recognized as a high priority area of research in sarcoidosis29. We implemented an approach that coupled state-of-the-art mass spectrometry based proteomics with novel bioinformatics for a comprehensive characterization of the protein changes in the lung compartment in well-phenotyped cases. In the absence of well-characterized animal models, the examination of BAL cells provides an ex vivo model of the immune response in sarcoidosis. While proteomic studies have been conducted previously23,26,28, no prior study has comprehensively characterized mixed BAL cells and BALF. In addition, we established workflows for comprehensive characterization of BALF to obtain unprecedented coverage and detect proteins that originate from diverse cellular and extracellular sources. Ultimately, our approach to characterize mixed BAL cells captured the complex interplay between inflammatory cells in sarcoidosis. Specifically, in BAL cells and fluid we identified several pathways present in macrophages such as clathrin-mediated endocytic signaling and other phagocytic processes as well as redox-related pathways that were previously reported to be upregulated in sarcoidosis23,30. We also identified novel pathways implicated in sarcoidosis such as signaling by integrin-linked kinase, IL-8, and caveolar-mediated endocytic signaling in our studies comparing BAL cells from controls and sarcoidosis cases. The studies in BALF showed higher levels of chitotriosidase, a potential biomarker and an investigational agent for therapy31,32 when comparing cases to controls. Several of the biological pathways identified in the BAL cells were also identified in the BALF, suggesting that BALF is a useful biofluid to investigate mechanistic processes in sarcoidosis. In our comparison of cases with progressive vs. non-progressive sarcoidosis, we identified several novel pathways that may be involved in progression in sarcoidosis. These included CD28 signaling and PFKFB4 signaling. These results suggest that a systematic characterization of BALF may prove fruitful to develop disease models and classifiers with diagnostic and prognostic utility, while BALF and the cellular proteome will provide insight into the mechanisms underlying sarcoidosis as well as the processes that promote progressive disease.

We examined BAL cells as the inflammatory response is aberrant in sarcoidosis with (a) yet unknown antigen(s) triggering an exuberant although dysfunctional immune response with CD4 + T cells, Tregs, high levels of Th1 cytokines TNF-α, IFNγ, and IL-210,33,34, along with inappropriate counter regulatory responses. Previous studies investigating protein changes in alveolar macrophages23,26 and gene expression changes in peripheral blood mononuclear cells35 found phagocytosis-related pathways to be upregulated in sarcoidosis subjects such as Fcγ receptor-mediated phagocytosis and clathrin-mediated endocytic signaling. We identified differences in cellular proteins mapping to phagosome maturation and clathrin-mediated endocytic signaling in sarcoidosis vs. controls BAL cells. Phagocytosis is crucial for innate and adaptive immune response and plays an essential role in antigen presentation, supporting the notion that sarcoidosis results from the response to an unknown external exposure requiring antigen processing and presentation for the development of disease. Similar to previous reports, we observed that the proteins involved in clathrin-mediated endocytic signaling differ in sarcoidosis cases when compared to controls. Additionally, caveolar-mediated endocytic signaling was also different between the two comparison groups. While both these pathways play a role in endocytic internalization of a variety of particles, again implicating exposure in disease ontogeny, these pathways also play a role in signal transduction and the regulation of many plasma membrane activities that have not been studied in sarcoidosis as well as have an influence on the immune response in alveolar macrophages36,37 and peripheral blood mononuclear cells38. In fact, the role of clathrin and caveolar pathways in the development of sarcoidosis has not been systematically studied. Thus, our findings suggest new pathways for investigation of potential disease pathogenesis and or cell regulation in sarcoidosis.

With an unbiased approach, we identified several canonical pathways mapping differentially expressed proteins that have not been previously linked to sarcoidosis, but would be likely to play a role in disease pathogenesis. These include integrin-linked kinase (ILK) signaling, IL-8 signaling, and inhibition of matrix metalloproteinases. ILK is an intracellular protein that primarily functions to connect integrins to the cytoskeletal proteins. The intracellular domain of ILK interacts with different proteins and regulates the phosphorylation of protein kinase B (PI3K)/AKT1 and glycogen synthase kinase 3B39. The downstream signaling cascade of PI3K/AKT activation includes activation of mTOR25, which is implicated in the development and the progression of sarcoidosis and has been proposed as a potential therapeutic target40. Thus, ILK-mediated mTOR activation could be a possible mechanism mediating inflammation in a subset of sarcoidosis cases. ILK signaling also activates c-Jun N-terminal kinase (JNK) via transcription-factor activator protein 1 (AP1) and regulates the gene expression of MMP941 and also IL-8 signaling42. IL-8 is a chemokine in the CXC family and is produced by non-leucocytic and leucocytic cells including macrophages, and binds to CXCR1 and CXCR2 surface receptors43. Several cytokines such as TNF-α induce the production of IL-844. Higher levels of IL-8 have been reported sarcoidosis BALF45 and serum, with the latter correlating with pulmonary46 and chronic disease47. IL-8 signaling has recently been reported to directly regulate adaptive T cell reactivity48 and phagosome function. Thus, our findings are not surprising but suggest that future studies investigating IL-8 signaling could improve the understanding of sarcoidosis pathogenesis and potentially phenotypes. They also highlight the importance of comprehensive characterization of the BAL cell protein changes in providing insight into sarcoidosis development and or progression, an approach that offers promise and is underutilized thus far in sarcoidosis research.

The examination of BALF revealed many proteins that are represented by canonical pathways that were also found in BAL cells. This indicates that biological mechanisms that contribute to the development of sarcoidosis can be identified in the BALF. When we compared BALF from sarcoidosis subjects to controls, similar to the findings from BAL cells, we identified several pathways that are linked to the inflammatory response. These included phagosome formation/maturation, clathrin- and caveolar-mediated endocytic signaling, LXR/RXR activation, IL-8 signaling, fatty acid oxidation, NRF2-mediated oxidative stress response and tryptophan degradation. Several of these pathways are also assigned to the proteins that are differentially expressed between progressive and non-progressive sarcoidosis cases. Some BALF pathways map to proteins that are only differentially expressed between progressive and non-progressive sarcoidosis. Specifically, we identified proteins mapping to CD28 signaling in T-helper cells, PFKFB4 (6-phosphofructo-2-kinase/fructose-2,6-biphosphatase 4) signaling and IL-12 signaling and production in macrophages. CD28 is a stimulatory immune checkpoint molecule of B7-CD28 superfamily with diverse roles in naïve and CD4 + T cells. The cytoplasmic tail of CD28 contains signaling motifs that are phosphorylated in response to TCR and CD28 stimulation49. Binding of the adaptor proteins to the activated motif, in turn phosphorylates and activates CDC-4250, culminating in the activation of JNK51. While we did not identify enrichment of canonical JNK pathways, BALF may only reflect some of the processes involved in sarcoidosis pathogenesis with secreted proteins. Regardless, the finding of differentially expressed BALF proteins mapping to CDC-42 and CD28 signaling suggests that they may possibly be involved in disease progression. Additionally, CD28 controls differentiation of Tregs from naïve CD4 T cells, providing novel mechanisms that may explain progression or remission of sarcoidosis. Interestingly, we identified PFKFB4 and IL-12 signaling also mapping to proteins that are differentially expressed in progressive vs. non-progressive cases. PFKFB4 is a bifunctional glycolytic enzyme that synthesizes and degrades fructose 2,6,-biphosphate. PFKFB4 regulates glucose metabolism and cell fate of dendritic cells52 and may provide a link for immunomodulatory effects by 1,25-dihydroxyvitamin D3 (1,25 (OH2) D3). Vanhewegan et al., identified PFKFB4 as a master regulator of 1,25 (OH)2 D3 induced DC tolerogenicity and inhibition of PFKFB4 signaling promotes secretion of proinflammatory cytokines including TNF-α53. The exact role of these pathways in the progression of pulmonary disease remains to be investigated, but our study suggests further investigation should be undertaken.

In pulmonary sarcoidosis, higher oxidant stress is reported in inflammatory cells in the lung54 and BALF55. In our study, the examination of mixed BAL cells indicated altration in redox balance in newly-diagnosed sarcoidosis subjects. Specifically, the mitochondrial l-carnitine shuttle pathway which is involved in fatty acid and lipid degradation, was mapped by proteins with differential abundance in controls compared to  sarcoidosis, suggesting that the mitochondrial metabolism is altered. Furthermore, we found differentially expressed proteins in pathways related to β-oxidation of fatty acids and mitochondrial dysfunction. We also identified several cytoprotective enzymes that mapped to NRF2 mediated oxidative stress response were differentially abundant in sarcoidosis compared to controls. NRF2 regulates mitochondrial redox homeostasis by several mechanisms such as detoxification of peroxides, regeneration of GSH, increased synthesis of GSH and NADPH and via the NRF2-Keap 1 response. Mitochondrial dysfunction occurs when the reactive oxygen species (ROS)-mediated stress overpowers the antioxidant defense system56. Bleomycin challenge in NRF2 knockout mice results in increased inflammatory makers, lower level of antioxidant enzymes, a bias towards Th2 response and increased fibrosis57. Taken together, these findings suggest that in sarcoidosis abnormal fatty acid and lipid degradation in the mitochondria cause the production of oxidants, with altered redox balance. It is possible that the detoxification mechanisms are overwhelmed causing mitochondrial dysfunction, production of reactive oxygen species that contribute to the inflammatory response seen in the lungs. NRF2 activators such as curcumin, sulforaphane, resveratrol, and quercetin counteract increased oxidant stress have a potential benefit in acute respiratory distress syndrome58, chronic obstructive pulmonary disease59, asthma60 and idiopathic pulmonary fibrosis61 and could be tested as a possible therapeutic strategy in sarcoidosis. The proteins differentially expressed between controls vs sarcoidosis and progressive vs. non-progressive sarcoidosis cases also mapped to Aryl hydrocarbon receptor signaling. AhR signaling is emerging as an important regulator of immunity in response to endogenous and exogenous ligands62 including tryptophan and serotonin metabolism. The differentially expressed proteins in both of the comparisons mapped to tryptophan /serotonin degradation but only reached statistical significance in the sarcoidosis vs control comparison. AhR signaling controls adaptive immunity by regulating T cell differentiation and by effecting antigen-presenting cells63. AhR regulates T cell response at multiple levels including T cell fate64. AhR is linked to induction of CD4 + Treg or Th1765 and Th22 cell differentiation directing the balance between effector and regulatory T cells. AhR signaling is implicated in other diseases with granulomatous inflammation. In Crohn’s disease, AhR RNA transcripts were markedly downregulated in the inflamed tissue and in the CD4 + T cells66. AhR signaling is also implicated in particulate induced granulomatous inflammation such as silicosis 67.

While we observe differences in the biological processes annotated to the differentially expressed proteins, systematic investigation of the BAL could provide the yet elusive biomarkers with diagnostic and prognostic value in sarcoidosis. In our dataset, we found higher BAL levels of chitotriosidase in sarcoidosis cases compared to controls. Chitotriosidase is a monocyte-macrophage-derived protein that is elevated in plasma and BALF and has been associated with sarcoidosis severity68. Another interesting protein with differential expression higher in the BALF in sarcoidosis compared to controls, programmed cell death 1 ligand 2 (PD-L2), is a ligand for programmed death-1 (PD1) receptor. PD-L2 is a transmembrane protein that is involved in immune checkpoint activity of PD1. In sarcoidosis, PD1 has been linked to the development of T cell exhaustion69 and a blockade of the PD1 pathway restored sarcoidosis CD4 proliferative capacity70. The notion that the PD1 pathway is involved in sarcoidosis is also strengthened by reports of sarcoidosis-like illness in patients receiving PD1 immune checkpoint modulators71. While the presence of individual proteins in our dataset is encouraging, we expect that the systematic examination of global protein changes in the BALF coupled with statistical approaches to construct a parsimonious model consisting of an orthogonal set of proteins will be the best approach for diagnosis and prognostication in sarcoidosis.

A network-based approach is a powerful framework for studying the organizational structure of complex systems72. Networks are represented as a collection of features (nodes) and links (edges) that connect pairs of nodes. The ‘guilt-by-association’ principle73 implies shared biology of pathways. Moreover, biological networks demonstrate scale-free behavior74,75 indicating that they have a relatively large number of low-connectivity nodes and only a few high-connectivity nodes, called ‘hubs,’ that are likely to be essential to network function. In the analysis of cellular proteins, IL-8 signaling, leucocyte extravasation signaling, ILK signaling, glucocorticoid receptor signaling and clathrin-mediated endocytic signaling demonstrated high-connectivity (Fig. 2). The overlapping pathway analysis for the BALF comparison of sarcoidosis and controls identified complex networks with a large number of nodes (Fig. 4). Several immune pathways such as IL-8, leukocyte extravasation signaling, B-cell receptor signaling, phagosome formation, and communication between adaptive and innate immune response signaling demonstrated high-connectivity with each other. Several signal transduction pathways were also highly-connected to immune pathways. Similarly, serval metabolic pathways were highly connected and NRF2 mediated antioxidant response was a ‘node’ that connected the metabolic pathways to immune pathways. Immune pathways were also connected to acute phase response signing mediated by complement and coagulation activation. In the overlapping canonical pathways analysis of BALF proteins in progressive and non-progressive cases, CD28, CDC-42 and IL-8 signaling, and Th1 pathways had high-connectivity suggesting a central role of these pathways in the progression of pulmonary sarcoidosis. Identifying networks of sarcoidosis development and progression in larger samples would allow data partition-based modeling approaches to reveal network topology and may provide valuable insights into disease biology that can not be revealed with conventional reductionist approaches.

Despite the small sample size, we believe our pilot study provides proof of concept for this line of investigation. Our experimental design is robust as we used stringent thresholds for protein identification and a conservative permutation test that decreased the chances of false positives to determine the differentially expressed cellular proteins. Similarly, for the BALF study, we examined each sample in triplicate. We also used a robust PECA procedure that implements algorithms that identifies peptide-level quantitative differences for more robust inferences regarding protein levels76. We expected that this mass spectrometry based bioinformatics workflow would provide a pipeline for application to future large-scale studies in sarcoidosis. A larger sample size would provide more robust inferences regarding the cellular mechanisms of progressive sarcoidosis in a cohort that represents heterogeneity in disease biology and yet allow implementation of resampling methods such as bootstrapping and cross validation for data analysis. We anticipated that workflows developed in this pilot study would identify pathways in peripheral blood mononuclear cells or lymph node tissue, some of which will overlap with the pathways in the lung, as well as some that might differ in direction between BAL and blood or be distinct and apparent in blood only, potentially serving as an easily accessible biomarker. Furthermore, we also hypothesized there would be activation of kinase signal transduction pathways after PBMC recruitment to the lung or other organ and activation of specific canonical and signaling pathways that would govern disease progression or remission.

Conclusions and future directions

The pathophysiologic mechanisms that explain the variability in disease manifestations and course in sarcoidosis are not well understood. A significant challenge is the lack of established disease models that represent the systems contributing to the immune response in sarcoidosis. Single molecule studies are important for understanding the disease biology in sarcoidosis but fail to capture the interactions involved in heterogeneous diseases. Systems levels approaches will be critical to improve our understanding of sarcoidosis. As proteins are the primary effectors of cellular function, characterization of the changes in proteins will be essential to improve our knowledge of sarcoidosis. We established promising proteomics workflows that will be valuable to develop models (classifiers) for diagnosis and prognosis and also identify therapeutically tenable treatment targets in sarcoidosis. Investigating the cellular and BALF protein changes provides an opportunity to examine the complex interplay of protein interactions response for the development and progression of sarcoidosis as well test the validity of protein participating in these biological processes as biomarkers for disease diagnosis and predict progression. The novel mechanisms identified in our pilot study will need to be evaluated with conventional structure function study to determine causal links in sarcoidosis.

Materials and methods

The study was approved by the University of Minnesota (UMN) IRB (protocol number 1501M60321and the National Jewish Health IRB (protocol number HS-2458) and all studies were conducted under the relevant guidelines/regulations. Study participants provided informed consent for the collection of BAL fluid and cells for these studies.

Eligible subjects consisted of individuals with sarcoidosis defined by the criteria outlined in the Joint Statement of American Thoracic Society (ATS), the European Respiratory Society (ERS) and the World Association of Sarcoidosis and Other Granulomatous Disorders (WASOG)3. Subjects without presence of another disease that could significantly affect patient immune response were also enrolled as healthy controls. Bronchoscopy and bronchoalveolar lavage were performed per standard protocol at UMN and NJH77. Four newly diagnosed sarcoidosis subjects were enrolled for examination of BAL cells at UMN (Table 1). Leftover BAL cells from four normal controls were obtained from prior research studies. For the BALF studies, 10 sarcoidosis subjects and 7 healthy controls were enrolled at NJH (Table 2). After collection, the BAL was transported on ice, centrifuged at 500g for 10 min, and the resulting cells and supernatant were stored at − 80 °C using common procedures at the two sites.

For the BALF studies, the sarcoidosis subjects were divided into two distinct phenotypes: those with non-progressive pulmonary disease and those with progressive pulmonary disease using criteria previously established17,78,79. Non-progressive pulmonary disease cases had stable disease and met the following criteria on up to two-years follow-up or more: (1) ≤ 10% decline in FVC or FEV1 and a ≤ 15% decline in DLCO and a stable CXR, and/or (2) ≥ 10% improvement in FVC or a ≥ 15% improvement in DLCO or improved CXR AND (3) no indication for immunosuppressive therapy. Progressive pulmonary disease cases met any of the following criteria from diagnosis/initial evaluation on at least two-year follow-up: (1) ≥ 10% decline in FVC and/or FEV1; or a ≥ 15% decline in DLCO; or (2) worsening CXR as determined by the interpreting radiologist/ investigator; and/or (3) start of immune-suppressive therapy.

Protein isolation and MS spectral-data acquisition

Mixed BAL cells:

BAL cells were resuspended in hypotonic lysis buffer with HALT protease inhibitor cocktail (Thermo Fisher Scientific), and lysed using sequential cell disruption techniques including a freeze–thaw at 98 °C with vortexing and sonication (Sonics) on ice before buffering with 1 M triethylammonium bicarbonate (Sigma). The lysed cells were then centrifuged at 20,000g for 15 minutes and the supernatant was collected for further processing. To increase the protein recovery, the pellet from this step was resuspended in a buffer (containing 7 M urea and 2 M thiourea in 0.4 M triethylammonium bicarbonate at pH 8.5), freeze-thawed, vortexed and centrifuged at 15,000g for 15 min at 20 °C. The supernatants from the two centrifugation steps were combined and concentrated using an Amicon 3-MWCO filter (Millipore). An equal amount of protein was processed for in-gel cleanup and digestion (EMBL Method), reduced with dithiothreitol (Sigma-Aldrich), treated with iodoacetamide (Sigma-Aldrich) to block cysteine residues, digested with trypsin (Promega) and cleaned with an MCX stage tip (3M-Empore 2241). Isobaric labeling of digested peptides was carried out with TMT-10Plex (Thermo Fisher Scientific) reagent followed by MCX and SPE cleanup with appropriate buffer exchanges, and offline fractionation on Shimadzu Prominence with Xbridge 150 × 2.1 mm column (Waters) with two-minute fractions at a flow rate of 200µL/min, and peptide amounts of 15mAU-equivalent aliquots from fractions 7–38 were concatenated. LC–MS data was acquired for each concatenated fraction using an Easy-nLC 1,000 HPLC (Thermo Fisher Scientific, Waltham, MA) in tandem with an Orbitrap Fusion (Thermo Fisher Scientific) MS instrument.

BALF proteins:

The BALF was processed using our previously published protocol80,81. Briefly, BALF was sonicated (Sonics), centrifuged for 15 min at 14,000g at 4 °C and filtered with pre-rinsed (5% methanol and water) syringe (Monoject, Covidien) and 0.22um PES filter to remove remaining particulates. The fluid was then concentrated and desalted using Amicon 3-MWCO filters, and a Bradford assay (Bio-Rad) was used to quantify protein. High-abundance proteins were removed using a Seppro IgY 14 spin-column (Sigma-Aldrich) with appropriate buffer exchanges. Equal amount of enriched medium- and low-abundance protein was processed for in-gel cleanup and digestion similar to the BAL cells above. LC–MS data was acquired for each concatenated fraction using an Easy-nLC 1,000 HPLC in tandem with an Orbitrap Fusion using settings similar to BAL cells analysis with minor differences. The differences were (1) The column was heated to 50 °C and (2) the dynamic exclusion was set to 15 s with a 10-ppm high and low mass tolerance.

Mass spectral dataset analysis by sequence database search for protein identification and quantification

The BAL cell quantification was accomplished using TMT reagent, and the BALF dataset was analyzed using MS1 spectral quantification.

Identification and quantification of TMT-labeled cellular proteins:

The spectral dataset was searched against the target-decoy version of the Human UniProt database (72,886 protein sequences; October 10th 2018) along with the contaminant sequences from the cRAP database (https://www.thegpm.org/crap/). Scaffold Q + (version Scaffold_4.8.9, Proteome Software Inc., Portland, OR) was used to perform TMT-based peptide quantitation and protein identification. The threshold of peptide identifications was set at an FDR < 0.5% by the Scaffold Local FDR algorithm. The protein identifications were accepted if they could be established at greater than 99.0% probability and contained at least one peptide82. Channels were corrected according to the algorithm described in i-Tracker83. Normalization was performed iteratively (across samples and spectra) on intensities, as described in Statistical Analysis of Relative Labeled Mass Spectrometry Data from Complex Samples Using ANOVA84. Medians were used for averaging. Spectra data were log-transformed, pruned off those matched to multiple proteins and those missing a reference value, and weighted by an adaptive intensity-weighting algorithm. Of 23,837 spectra in the experiment at the given thresholds, 16,890 (71%) were included in quantitation. The proteins that matched to the cRAP or the decoy sequence were removed from analysis.

Identification and MS1quant label-free quantification of BALF proteins

Raw files were searched against the target-decoy version of Human UniProt database (73,928 protein sequences, November 21 2019) along with the cRAP database using the MaxQuant 1.6.10.43 algorithm. Default search parameters were used as follows: peptide spectral matching and proteins with 1% FDR modifications include fixed carbamidomethyl of C, variable oxidation of M, and N-terminal acetylation. BALF samples were quantified in label-free quantification (LFQ) mode, and spectra were “matched between runs”.

Statistical analysis

The peptide-level data for the BALF was imported into the GalaxyP (https://galaxyp.org) framework for implementing the Peptide-level Expression Change Averaging (PECA)-procedure76 using the Bioconductor package (https://www.bioconductor.org/packages/release/bioc/html/PECA.html). This method differs from the common approach, where protein expression intensities are precomputed from the peptide data and an expression change between two groups of samples is first calculated for each measured peptide. The corresponding protein-level expression changes are then defined as medians over the peptide-level changes. For this analysis, we determined the modified t-statistic, which is calculated using the linear modeling approach in the Bioconductor limma (linear models for microarray data) package85. To identify differential expression in the BALF dataset, the comparability of relative expression changes between alternative peptides was investigated by considering signal log-ratios by a two-sample t-test with a p-value ≤ 0.05 corrected for multiple hypotheses testing. For the intracellular proteins, given the substantially higher number of proteins detected, we used a conservative permutation test to decrease the possibility of type 1 error rate with an unadjusted significance level p ≤ 0.05 corrected by the Benjamini–Hochberg method for testing multiple hypotheses.

To gain insight into the biological significance of differentially expressed proteins, we performed functional analysis using Ingenuity Pathway Analysis [IPA (IPA QIAGEN, Redwood City https://www.quiagen.com/ingenuty)]. This analysis was performed on proteins with an FDR corrected p-value ≤ 0.05 as the cutoff for differential expression for both BAL cell and fluid datasets. The IPA core analysis was performed using the difference of the weighted log fold change between comparison groups. We focused on canonical pathways that met a Benjamini and Hochberg (B-H)–corrected p-value obtained using the right-tailed Fisher exact test of ≤ 0.05 (equivalent to −log [B-H p-value] ≥ 1.3), as done previously81. We also examined on Overlapping Canonical Pathways functionality in IPA which is designed to visualize the shared biology in pathways through the common features (genes/proteins) in the pathways. The network of overlapping pathways shows each canonical pathway meeting the statistical threshold of −log (B-H p-value) ≥ 1.3 as a single “node”. An edge connects any two pathways when there are at least two common proteins shared between the pathways.