Novel protein pathways in development and progression of pulmonary sarcoidosis

Pulmonary involvement occurs in up to 95% of sarcoidosis cases. In this pilot study, we examine lung compartment-specific protein expression to identify pathways linked to development and progression of pulmonary sarcoidosis. We characterized bronchoalveolar lavage (BAL) cells and fluid (BALF) proteins in recently diagnosed sarcoidosis cases. We identified 4,306 proteins in BAL cells, of which 272 proteins were differentially expressed in sarcoidosis compared to controls. These proteins map to novel pathways such as integrin-linked kinase and IL-8 signaling and previously implicated pathways in sarcoidosis, including phagosome maturation, clathrin-mediated endocytic signaling and redox balance. In the BALF, the differentially expressed proteins map to several pathways identified in the BAL cells. The differentially expressed BALF proteins also map to aryl hydrocarbon signaling, communication between innate and adaptive immune response, integrin, PTEN and phospholipase C signaling, serotonin and tryptophan metabolism, autophagy, and B cell receptor signaling. Additional pathways that were different between progressive and non-progressive sarcoidosis in the BALF included CD28 signaling and PFKFB4 signaling. Our studies demonstrate the power of contemporary proteomics to reveal novel mechanisms operational in sarcoidosis. Application of our workflows in well-phenotyped large cohorts maybe beneficial to identify biomarkers for diagnosis and prognosis and therapeutically tenable molecular mechanisms.


Abbreviations
Biological relevance of the differentially expressed proteins in the BAL cells of cases compared to controls. To determine the biological significance of the differentially expressed proteins, we performed Table 1. Clinical and demographic variables for controls and sarcoidosis subjects for BAL cell studies. Date presented as median (IQR). *Mann-Whitney test or Chi-square test. # All subjects were non-smokers (controls: 1 never smoker, 2 former smoker and 1 prior smoking history not known; cases: 3 never smokers and 1 former smoker).  Table 2. Clinical and demographic variables for controls and sarcoidosis subjects for BALF studies. Date presented as median (IQR). ANOVA with post hoc Tukey test to compare all pairs of columns. *Significant difference between controls and progressive group. # All subjects except one were non-smokers (controls: 3 never smokers, 3 former smokers, 1 prior smoking history not known; non-progressive sarcoidosis: 3 never smokers, 2 former smokers; progressive sarcoidosis: 1 current smoker, 2 former smokers, 2 never smokers).

Differences in the bronchoalveolar lavage fluid proteins between sarcoidosis and controls and
between sarcoidosis phenotypes. We examined BALF from seven control and ten sarcoidosis subjects.
All BALF samples were analyzed by label-free mass spectrometry in triplicates. We identified 1,293 BALF proteins at an FDR of ≤ 1% (Supplemental Table S2; 'Original File' tab). These included 62 proteins that matched to the decoy (reverse) sequences or cRAP database such as keratins, filaggrin, cartilage matrix proteins, which were not considered for further analysis. The remaining 1,231 included 1,195 proteins present in all patients and controls. Seven proteins were only detected in controls and not in sarcoidosis cases, while five proteins were present only in sarcoidosis cases but not in control BALF. There were 12 proteins detected in controls and nonprogressive cases but not in progressive sarcoidosis, five proteins in control and progressive cases but not in nonprogressive sarcoidosis, one protein in only non-progressive but not in controls or progressive sarcoidosis, and four proteins were detected in only progressive but not in non-progressive sarcoidosis or controls (Fig. 3A). Peptides from the 1,231 BALF proteins (Supplemental Table S2; HAP CON REV tab) included proteins that originate from inflammatory cells and epithelial cell such as chitotriosidase-1, macrophage colony stimulating factor, Fc-gamma RIII-alpha, macrophage migration inhibitory factor (macrophage), human neutrophil defensin 3, neutrophil elastase (neutrophils), lymphocyte antigen, lymphocyte cytosolic protein (lymphocytes), aquaporin 1 and 5 (type 1 alveolar epithelial cells), and surfactant protein B (type 2 alveolar epithelial cells). Sixty-nine high abundance and immunoglobulin proteins or immunoglobulin fractions that were not completely removed by the high-abundance protein depletion column were also detected. These proteins were included for functional analysis as these proteins are crucial for many biological functions. Good quality quantitative spectral data was available to compare 1,223 of the 1,231 proteins in sarcoidosis vs. control subjects (Supplemental Table S3; 'Sarc vs. control' tab) and 1,206 of 1,231 proteins in progressive vs. non-progressive pulmonary sarcoidosis subjects (Supplemental Table S3; 'P vs NP' tab). We identified 293 differentially expressed proteins in sarcoidosis (n = 10) compared to the seven control subjects (Supplemental Table S3; Sarcoidosis vs Control tab), Fig. 3B. These proteins included chitotriosidase-1, serum amyloid protein P, surfactant protein D, S100P, inter-alpha-trypsin inhibitor, annexin, glutathione-S-transferase, interleukin-1 receptor accessory protein, cystatin-5, caveolin, choline transport protein, Fc-gamma RII-a, (Fcγ-binding protein), interleukin 6 receptor, programmed cell death 1 ligand 2, and aquaporin-1. The proteins with most significant differences with a higher abundance in sarcoidosis or controls are listed in Table 5. To find the biological relevance of the differentially expressed proteins, we determined the canonical pathways that map to these proteins (Table 6). These pathways include phagosome formation and maturation, IL-8 signaling, IL-12 signaling in macrophages, clathrin and caveolin endocytic signaling, LXR/RXR activation, B cell receptor signaling, communication between innate and adaptive immune cells, aryl hydrocarbon receptor signaling and NRF2-mediated oxidative stress response. Kinases signaling pathways such as PTEN, phospholipase C and GP6 signaling also map to the differentially expressed proteins. Overlapping Canonical Pathway analysis identified highly intricate network of pathways participating in immunological functions, acute phase response and metabolic processes (Fig. 4). The z-score indicating the activation state was available for LXR/RXR activation (2.9), acute phase response signaling (1.39), complement system (− 0.8), coagulation system (− 0.816), agrin interactions at the neuromuscular junction (− 1.633), glutathione-mediate detoxification (1.3), osteoarthritis pathways (− 0.4), SPINK1 pancreatic cancer pathway (1.6), intrinsic prothrombin activation pathway (− 0.5), phospholipase C signaling (− 0.6), serotonin degradation (− 1.3), BAG2 Signaling Pathway (− 1.), neuroprotective role of THOP1 in Alzheimer's disease (− 2.2), leucocyte extravasation signaling (− 1.4), IL-8 signaling (− 0.4), GP6 signaling Pathway (0.8), PTEN signaling (2.5) and integrin signaling (− 1.9).
When we compared the BALF proteins between progressive vs. non-progressive sarcoidosis subjects (n = 5 each), there were 121 differentially expressed proteins. The proteins that differed between phenotypes included heat shock protein 90, glutathione-S-transferase, mucin-5B, annexin, CD5 antigen like protein (apoptosis inhibitor expressed by macrophages), chitotriosidase 1, ICAM 1, tropomyosin, integrin beta-2, pulmonary surfactant protein B and D, fatty acid binding protein, and HLA class II histocompatibility antigen DQ-α. The proteins with most significant differences with a higher abundance in cases with progressive disease compared to nonprogressive disease are listed in Table 7. To determine the pathways that may contribute to the progression of sarcoidosis, we mapped the differentially expressed proteins between the progressive and non-progressive cases to canonical pathways in IPA (Table 8); these include aryl hydrocarbon receptor signaling, clathrin-mediated endocytic signaling, glutathione redox reaction, glutathione-mediated detoxification, antigen presentation pathway, phagosome formation, CD28 signaling in T-helper cells, CDC-42 signaling, RhoA signaling and PFKFB4 signaling pathway (Fig. 5). The z-score indicating the activation state was available for glycolysis (1.0), LXR/ RXR (− 1.6) and IL-8 signaling (1.3).

Discussion
Use of 'omics' tools to improve the understanding of sarcoidosis has been recognized as a high priority area of research in sarcoidosis 29 . We implemented an approach that coupled state-of-the-art mass spectrometry based proteomics with novel bioinformatics for a comprehensive characterization of the protein changes in the lung compartment in well-phenotyped cases. In the absence of well-characterized animal models, the examination of BAL cells provides an ex vivo model of the immune response in sarcoidosis. While proteomic studies have been conducted previously 23,26,28  www.nature.com/scientificreports/ coverage and detect proteins that originate from diverse cellular and extracellular sources. Ultimately, our approach to characterize mixed BAL cells captured the complex interplay between inflammatory cells in sarcoidosis. Specifically, in BAL cells and fluid we identified several pathways present in macrophages such as clathrin-mediated endocytic signaling and other phagocytic processes as well as redox-related pathways that were previously reported to be upregulated in sarcoidosis 23,30 . We also identified novel pathways implicated in sarcoidosis such as signaling by integrin-linked kinase, IL-8, and caveolar-mediated endocytic signaling in our studies comparing BAL cells from controls and sarcoidosis cases. The studies in BALF showed higher levels of chitotriosidase, a potential biomarker and an investigational agent for therapy 31,32 when comparing cases to controls. Several of the biological pathways identified in the BAL cells were also identified in the BALF, suggesting that BALF is a useful biofluid to investigate mechanistic processes in sarcoidosis. In our comparison of cases with progressive vs. non-progressive sarcoidosis, we identified several novel pathways that may be involved in progression in sarcoidosis. These included CD28 signaling and PFKFB4 signaling. These results suggest that a systematic characterization of BALF may prove fruitful to develop disease models and classifiers with diagnostic  www.nature.com/scientificreports/ and prognostic utility, while BALF and the cellular proteome will provide insight into the mechanisms underlying sarcoidosis as well as the processes that promote progressive disease. We examined BAL cells as the inflammatory response is aberrant in sarcoidosis with (a) yet unknown antigen(s) triggering an exuberant although dysfunctional immune response with CD4 + T cells, Tregs, high levels of Th1 cytokines TNF-α, IFNγ, and IL-2 10,33,34 , along with inappropriate counter regulatory responses. Previous studies investigating protein changes in alveolar macrophages 23,26 and gene expression changes in peripheral blood mononuclear cells 35 found phagocytosis-related pathways to be upregulated in sarcoidosis subjects such as Fcγ receptor-mediated phagocytosis and clathrin-mediated endocytic signaling. We identified differences in cellular proteins mapping to phagosome maturation and clathrin-mediated endocytic signaling in sarcoidosis www.nature.com/scientificreports/ vs. controls BAL cells. Phagocytosis is crucial for innate and adaptive immune response and plays an essential role in antigen presentation, supporting the notion that sarcoidosis results from the response to an unknown external exposure requiring antigen processing and presentation for the development of disease. Similar to previous reports, we observed that the proteins involved in clathrin-mediated endocytic signaling differ in sarcoidosis cases when compared to controls. Additionally, caveolar-mediated endocytic signaling was also different between the two comparison groups. While both these pathways play a role in endocytic internalization of a variety of particles, again implicating exposure in disease ontogeny, these pathways also play a role in signal transduction and the regulation of many plasma membrane activities that have not been studied in sarcoidosis as well as have an influence on the immune response in alveolar macrophages 36,37 and peripheral blood mononuclear cells 38 .
In fact, the role of clathrin and caveolar pathways in the development of sarcoidosis has not been systematically studied. Thus, our findings suggest new pathways for investigation of potential disease pathogenesis and or cell regulation in sarcoidosis.
With an unbiased approach, we identified several canonical pathways mapping differentially expressed proteins that have not been previously linked to sarcoidosis, but would be likely to play a role in disease pathogenesis. These include integrin-linked kinase (ILK) signaling, IL-8 signaling, and inhibition of matrix metalloproteinases. ILK is an intracellular protein that primarily functions to connect integrins to the cytoskeletal proteins. The intracellular domain of ILK interacts with different proteins and regulates the phosphorylation of protein kinase B (PI3K)/AKT1 and glycogen synthase kinase 3B 39 . The downstream signaling cascade of PI3K/AKT activation includes activation of mTOR 25 , which is implicated in the development and the progression of sarcoidosis and has been proposed as a potential therapeutic target 40 . Thus, ILK-mediated mTOR activation could be a possible mechanism mediating inflammation in a subset of sarcoidosis cases. ILK signaling also activates c-Jun N-terminal kinase (JNK) via transcription-factor activator protein 1 (AP1) and regulates the gene expression of MMP9 41 and also IL-8 signaling 42 . IL-8 is a chemokine in the CXC family and is produced by non-leucocytic and leucocytic cells including macrophages, and binds to CXCR1 and CXCR2 surface receptors 43 . Several cytokines such as TNF-α induce the production of IL-8 44 . Higher levels of IL-8 have been reported sarcoidosis BALF 45 and serum, with the latter correlating with pulmonary 46 and chronic disease 47 . IL-8 signaling has recently been reported to directly regulate adaptive T cell reactivity 48 and phagosome function. Thus, our findings are not surprising but suggest that future studies investigating IL-8 signaling could improve the understanding of sarcoidosis pathogenesis and potentially phenotypes. They also highlight the importance of comprehensive characterization of the BAL cell protein changes in providing insight into sarcoidosis development and or progression, an approach that offers promise and is underutilized thus far in sarcoidosis research.
The examination of BALF revealed many proteins that are represented by canonical pathways that were also found in BAL cells. This indicates that biological mechanisms that contribute to the development of sarcoidosis can be identified in the BALF. When we compared BALF from sarcoidosis subjects to controls, similar to the findings from BAL cells, we identified several pathways that are linked to the inflammatory response. These included phagosome formation/maturation, clathrin-and caveolar-mediated endocytic signaling, LXR/RXR activation, IL-8 signaling, fatty acid oxidation, NRF2-mediated oxidative stress response and tryptophan degradation. Several of these pathways are also assigned to the proteins that are differentially expressed between progressive and non-progressive sarcoidosis cases. Some BALF pathways map to proteins that are only differentially expressed between progressive and non-progressive sarcoidosis. Specifically, we identified proteins mapping to CD28 signaling in T-helper cells, PFKFB4 (6-phosphofructo-2-kinase/fructose-2,6-biphosphatase 4) signaling and IL-12 signaling and production in macrophages. CD28 is a stimulatory immune checkpoint molecule of B7-CD28 superfamily with diverse roles in naïve and CD4 + T cells. The cytoplasmic tail of CD28 contains signaling motifs that are phosphorylated in response to TCR and CD28 stimulation 49 . Binding of the adaptor proteins to the activated motif, in turn phosphorylates and activates CDC-42 50 , culminating in the activation of JNK 51 . While we did not identify enrichment of canonical JNK pathways, BALF may only reflect some of the processes involved in sarcoidosis pathogenesis with secreted proteins. Regardless, the finding of differentially expressed BALF proteins mapping to CDC-42 and CD28 signaling suggests that they may possibly be involved in disease progression. Additionally, CD28 controls differentiation of Tregs from naïve CD4 T cells, providing novel mechanisms that may explain progression or remission of sarcoidosis. Interestingly, we identified PFKFB4 and IL-12 signaling also mapping to proteins that are differentially expressed in progressive vs. non-progressive cases. PFKFB4 is a bifunctional glycolytic enzyme that synthesizes and degrades fructose 2,6,-biphosphate. PFKFB4 regulates glucose metabolism and cell fate of dendritic cells 52 and may provide a link for immunomodulatory effects by 1,25-dihydroxyvitamin D 3 (1,25 (OH 2 ) D 3 ). Vanhewegan et al., identified PFKFB4 as a master regulator of 1,25 (OH) 2 D 3 induced DC tolerogenicity and inhibition of PFKFB4 signaling promotes secretion of proinflammatory cytokines including TNF-α 53 . The exact role of these pathways in the progression of pulmonary disease remains to be investigated, but our study suggests further investigation should be undertaken.
In pulmonary sarcoidosis, higher oxidant stress is reported in inflammatory cells in the lung 54 and BALF 55 . In our study, the examination of mixed BAL cells indicated altration in redox balance in newly-diagnosed sarcoidosis subjects. Specifically, the mitochondrial l-carnitine shuttle pathway which is involved in fatty acid and lipid degradation, was mapped by proteins with differential abundance in controls compared to sarcoidosis, suggesting that the mitochondrial metabolism is altered. Furthermore, we found differentially expressed proteins in pathways related to β-oxidation of fatty acids and mitochondrial dysfunction. We also identified several cytoprotective enzymes that mapped to NRF2 mediated oxidative stress response were differentially abundant in sarcoidosis compared to controls. NRF2 regulates mitochondrial redox homeostasis by several mechanisms such as detoxification of peroxides, regeneration of GSH, increased synthesis of GSH and NADPH and via the NRF2-Keap 1 response. Mitochondrial dysfunction occurs when the reactive oxygen species (ROS)-mediated stress overpowers the antioxidant defense system 56  www.nature.com/scientificreports/ fibrosis 57 . Taken together, these findings suggest that in sarcoidosis abnormal fatty acid and lipid degradation in the mitochondria cause the production of oxidants, with altered redox balance. It is possible that the detoxification mechanisms are overwhelmed causing mitochondrial dysfunction, production of reactive oxygen species that contribute to the inflammatory response seen in the lungs. NRF2 activators such as curcumin, sulforaphane, resveratrol, and quercetin counteract increased oxidant stress have a potential benefit in acute respiratory distress syndrome 58 , chronic obstructive pulmonary disease 59 , asthma 60 and idiopathic pulmonary fibrosis 61 and could be tested as a possible therapeutic strategy in sarcoidosis. The proteins differentially expressed between controls vs sarcoidosis and progressive vs. non-progressive sarcoidosis cases also mapped to Aryl hydrocarbon receptor signaling. AhR signaling is emerging as an important regulator of immunity in response to endogenous and exogenous ligands 62 including tryptophan and serotonin metabolism. The differentially expressed proteins in both of the comparisons mapped to tryptophan /serotonin degradation but only reached statistical significance   Acute phase response signaling  10.1   A2M, APCS, APOA1, APOH, C3, C9, CP, FGB,  FN1, HP, IL1RAP, IL6ST, ITIH2, ITIH3, ITIH4,  RRAS, SERPINA1, SERPINA3,  www.nature.com/scientificreports/ in the sarcoidosis vs control comparison. AhR signaling controls adaptive immunity by regulating T cell differentiation and by effecting antigen-presenting cells 63 . AhR regulates T cell response at multiple levels including T cell fate 64 . AhR is linked to induction of CD4 + Treg or Th17 65 and Th22 cell differentiation directing the balance between effector and regulatory T cells. AhR signaling is implicated in other diseases with granulomatous inflammation. In Crohn's disease, AhR RNA transcripts were markedly downregulated in the inflamed tissue and in the CD4 + T cells 66 . AhR signaling is also implicated in particulate induced granulomatous inflammation such as silicosis 67 . While we observe differences in the biological processes annotated to the differentially expressed proteins, systematic investigation of the BAL could provide the yet elusive biomarkers with diagnostic and prognostic value in sarcoidosis. In our dataset, we found higher BAL levels of chitotriosidase in sarcoidosis cases compared to controls. Chitotriosidase is a monocyte-macrophage-derived protein that is elevated in plasma and BALF and has been associated with sarcoidosis severity 68 . Another interesting protein with differential expression higher in the BALF in sarcoidosis compared to controls, programmed cell death 1 ligand 2 (PD-L2), is a ligand for  www.nature.com/scientificreports/ programmed death-1 (PD1) receptor. PD-L2 is a transmembrane protein that is involved in immune checkpoint activity of PD1. In sarcoidosis, PD1 has been linked to the development of T cell exhaustion 69 and a blockade of the PD1 pathway restored sarcoidosis CD4 proliferative capacity 70 . The notion that the PD1 pathway is involved in sarcoidosis is also strengthened by reports of sarcoidosis-like illness in patients receiving PD1 immune checkpoint modulators 71 . While the presence of individual proteins in our dataset is encouraging, we expect that the systematic examination of global protein changes in the BALF coupled with statistical approaches to construct a parsimonious model consisting of an orthogonal set of proteins will be the best approach for diagnosis and prognostication in sarcoidosis. A network-based approach is a powerful framework for studying the organizational structure of complex systems 72 . Networks are represented as a collection of features (nodes) and links (edges) that connect pairs of nodes. The 'guilt-by-association' principle 73 implies shared biology of pathways. Moreover, biological networks demonstrate scale-free behavior 74,75 indicating that they have a relatively large number of low-connectivity nodes and only a few high-connectivity nodes, called 'hubs, ' that are likely to be essential to network function. In the analysis of cellular proteins, IL-8 signaling, leucocyte extravasation signaling, ILK signaling, glucocorticoid www.nature.com/scientificreports/ receptor signaling and clathrin-mediated endocytic signaling demonstrated high-connectivity (Fig. 2). The overlapping pathway analysis for the BALF comparison of sarcoidosis and controls identified complex networks with a large number of nodes (Fig. 4). Several immune pathways such as IL-8, leukocyte extravasation signaling, B-cell receptor signaling, phagosome formation, and communication between adaptive and innate immune response signaling demonstrated high-connectivity with each other. Several signal transduction pathways were also highly-connected to immune pathways. Similarly, serval metabolic pathways were highly connected and NRF2 mediated antioxidant response was a 'node' that connected the metabolic pathways to immune pathways. Immune pathways were also connected to acute phase response signing mediated by complement and coagulation activation. In the overlapping canonical pathways analysis of BALF proteins in progressive and non-progressive cases, CD28, CDC-42 and IL-8 signaling, and Th1 pathways had high-connectivity suggesting a central role of these pathways in the progression of pulmonary sarcoidosis. Identifying networks of sarcoidosis development and progression in larger samples would allow data partition-based modeling approaches to reveal network topology and may provide valuable insights into disease biology that can not be revealed with conventional reductionist approaches. Despite the small sample size, we believe our pilot study provides proof of concept for this line of investigation. Our experimental design is robust as we used stringent thresholds for protein identification and a conservative permutation test that decreased the chances of false positives to determine the differentially expressed cellular proteins. Similarly, for the BALF study, we examined each sample in triplicate. We also used a robust PECA procedure that implements algorithms that identifies peptide-level quantitative differences for more robust inferences regarding protein levels 76 . We expected that this mass spectrometry based bioinformatics workflow would provide a pipeline for application to future large-scale studies in sarcoidosis. A larger sample size would provide more robust inferences regarding the cellular mechanisms of progressive sarcoidosis in a cohort that represents heterogeneity in disease biology and yet allow implementation of resampling methods such as bootstrapping and cross validation for data analysis. We anticipated that workflows developed in this pilot study would identify pathways in peripheral blood mononuclear cells or lymph node tissue, some of which will overlap with the pathways in the lung, as well as some that might differ in direction between BAL and blood or be distinct and apparent in blood only, potentially serving as an easily accessible biomarker. Furthermore, we also hypothesized there would be activation of kinase signal transduction pathways after PBMC recruitment to the lung or other organ and activation of specific canonical and signaling pathways that would govern disease progression or remission. Table 7. Top differential expressed BALF proteins comparing progressive to non-progressive cases. Fold changes calculated relative to non-progressive sarcoidosis cases resulting in a positive log fold change for proteins higher in progressive sarcoidosis and a negative log fold change for proteins higher in non-progressive sarcoidosis cases. Signal Log-ratio signal-log ratio (log 2 magnitude of change), p-value protein level p-value calculated from beta distribution, p.fdr false discovery rate corrected p-value.

conclusions and future directions
The pathophysiologic mechanisms that explain the variability in disease manifestations and course in sarcoidosis are not well understood. A significant challenge is the lack of established disease models that represent the systems contributing to the immune response in sarcoidosis. Single molecule studies are important for understanding the disease biology in sarcoidosis but fail to capture the interactions involved in heterogeneous diseases. Systems levels approaches will be critical to improve our understanding of sarcoidosis. As proteins are the primary effectors of cellular function, characterization of the changes in proteins will be essential to improve our knowledge of sarcoidosis. We established promising proteomics workflows that will be valuable to develop models (classifiers) for diagnosis and prognosis and also identify therapeutically tenable treatment targets in sarcoidosis. Investigating the cellular and BALF protein changes provides an opportunity to examine the complex interplay of protein interactions response for the development and progression of sarcoidosis as well test the validity of protein participating in these biological processes as biomarkers for disease diagnosis and predict progression. The novel mechanisms identified in our pilot study will need to be evaluated with conventional structure function study to determine causal links in sarcoidosis.

Materials and methods
The study was approved by the University of Minnesota (UMN) IRB (protocol number 1501M60321and the National Jewish Health IRB (protocol number HS-2458) and all studies were conducted under the relevant guidelines/regulations. Study participants provided informed consent for the collection of BAL fluid and cells for these studies. Eligible subjects consisted of individuals with sarcoidosis defined by the criteria outlined in the Joint Statement of American Thoracic Society (ATS), the European Respiratory Society (ERS) and the World Association of Sarcoidosis and Other Granulomatous Disorders (WASOG) 3 . Subjects without presence of another disease that could significantly affect patient immune response were also enrolled as healthy controls. Bronchoscopy and bronchoalveolar lavage were performed per standard protocol at UMN and NJH 77 . Four newly diagnosed sarcoidosis subjects were enrolled for examination of BAL cells at UMN (Table 1). Leftover BAL cells from four normal controls were obtained from prior research studies. For the BALF studies, 10 sarcoidosis subjects and 7 healthy controls were enrolled at NJH (Table 2). After collection, the BAL was transported on ice, centrifuged at 500g for 10 min, and the resulting cells and supernatant were stored at − 80 °C using common procedures at the two sites.
For the BALF studies, the sarcoidosis subjects were divided into two distinct phenotypes: those with non-progressive pulmonary disease and those with progressive pulmonary disease using criteria previously established 17,78,79 . Non-progressive pulmonary disease cases had stable disease and met the following criteria on up to two-years follow-up or more: (1) ≤ 10% decline in FVC or FEV1 and a ≤ 15% decline in DLCO and a stable CXR, and/or (2) ≥ 10% improvement in FVC or a ≥ 15% improvement in DLCO or improved CXR AND (3) no indication for immunosuppressive therapy. Progressive pulmonary disease cases met any of the following criteria from diagnosis/initial evaluation on at least two-year follow-up: (1) ≥ 10% decline in FVC and/or FEV1; or a ≥ 15% decline in DLCO; or (2) worsening CXR as determined by the interpreting radiologist/ investigator; and/or (3) start of immune-suppressive therapy.
protein isolation and MS spectral-data acquisition. Mixed BAL cells: BAL cells were resuspended in hypotonic lysis buffer with HALT protease inhibitor cocktail (Thermo Fisher Scientific), and lysed using sequential cell disruption techniques including a freeze-thaw at 98 °C with vortexing and sonication (Sonics) on ice before buffering with 1 M triethylammonium bicarbonate (Sigma). The lysed cells were then centrifuged at 20,000g for 15 minutes and the supernatant was collected for further processing. To increase the protein recovery, the pellet from this step was resuspended in a buffer (containing 7 M urea and 2 M thiourea in 0.4 M triethylammonium bicarbonate at pH 8.5), freeze-thawed, vortexed and centrifuged at 15,000g for 15 min at 20 °C. The supernatants from the two centrifugation steps were combined and concentrated using an Amicon 3-MWCO filter (Millipore). An equal amount of protein was processed for in-gel cleanup and digestion (EMBL Method), reduced with dithiothreitol (Sigma-Aldrich), treated with iodoacetamide (Sigma-Aldrich) to block cysteine residues, digested with trypsin (Promega) and cleaned with an MCX stage tip (3M-Empore 2241). Isobaric labeling of digested peptides was carried out with TMT-10Plex (Thermo Fisher Scientific) reagent followed by MCX and SPE cleanup with appropriate buffer exchanges, and offline fractionation on Shimadzu Prominence with Xbridge 150 × 2.1 mm column (Waters) with two-minute fractions at a flow rate of 200µL/min, and peptide amounts of 15mAU-equivalent aliquots from fractions 7-38 were concatenated. LC-MS data was acquired for each concatenated fraction using an Easy-nLC 1,000 HPLC (Thermo Fisher Scientific, Waltham, MA) in tandem with an Orbitrap Fusion (Thermo Fisher Scientific) MS instrument.
BALf proteins: The BALF was processed using our previously published protocol 80,81 . Briefly, BALF was sonicated (Sonics), centrifuged for 15 min at 14,000g at 4 °C and filtered with pre-rinsed (5% methanol and water) syringe (Monoject, Covidien) and 0.22um PES filter to remove remaining particulates. The fluid was then concentrated and desalted using Amicon 3-MWCO filters, and a Bradford assay (Bio-Rad) was used to quantify protein. High-abundance proteins were removed using a Seppro IgY 14 spin-column (Sigma-Aldrich) with appropriate buffer exchanges. Equal amount of enriched medium-and low-abundance protein was processed for in-gel cleanup and digestion similar to the BAL cells above. LC-MS data was acquired for each concatenated fraction using an Easy-nLC 1,000 HPLC in tandem with an Orbitrap Fusion using settings similar to BAL cells analysis with minor differences. The differences were (1) The column was heated to 50 °C and (2) the dynamic exclusion was set to 15 s with a 10-ppm high and low mass tolerance. www.nature.com/scientificreports/ Mass spectral dataset analysis by sequence database search for protein identification and quantification. The BAL cell quantification was accomplished using TMT reagent, and the BALF dataset was analyzed using MS1 spectral quantification.
Identification and quantification of TMT-labeled cellular proteins: The spectral dataset was searched against the target-decoy version of the Human UniProt database (72,886 protein sequences; October 10th 2018) along with the contaminant sequences from the cRAP database (https ://www.thegp m.org/crap/). Scaffold Q + (version Scaffold_4.8.9, Proteome Software Inc., Portland, OR) was used to perform TMT-based peptide quantitation and protein identification. The threshold of peptide identifications was set at an FDR < 0.5% by the Scaffold Local FDR algorithm. The protein identifications were accepted if they could be established at greater than 99.0% probability and contained at least one peptide 82 . Channels were corrected according to the algorithm described in i-Tracker 83 . Normalization was performed iteratively (across samples and spectra) on intensities, as described in Statistical Analysis of Relative Labeled Mass Spectrometry Data from Complex Samples Using ANOVA 84 . Medians were used for averaging. Spectra data were log-transformed, pruned off those matched to multiple proteins and those missing a reference value, and weighted by an adaptive intensity-weighting algorithm. Of 23,837 spectra in the experiment at the given thresholds, 16,890 (71%) were included in quantitation. The proteins that matched to the cRAP or the decoy sequence were removed from analysis.
Identification and MS1quant label-free quantification of BALF proteins. Raw files were searched against the target-decoy version of Human UniProt database (73,928 protein sequences, November 21 2019) along with the cRAP database using the MaxQuant 1.6.10.43 algorithm. Default search parameters were used as follows: peptide spectral matching and proteins with 1% FDR modifications include fixed carbamidomethyl of C, variable oxidation of M, and N-terminal acetylation. BALF samples were quantified in label-free quantification (LFQ) mode, and spectra were "matched between runs".
Statistical analysis. The peptide-level data for the BALF was imported into the GalaxyP (https ://galax yp.org) framework for implementing the Peptide-level Expression Change Averaging (PECA)-procedure 76 using the Bioconductor package (https ://www.bioco nduct or.org/packa ges/relea se/bioc/html/PECA.html). This method differs from the common approach, where protein expression intensities are precomputed from the peptide data and an expression change between two groups of samples is first calculated for each measured peptide. The corresponding protein-level expression changes are then defined as medians over the peptide-level changes. For this analysis, we determined the modified t-statistic, which is calculated using the linear modeling approach in the Bioconductor limma (linear models for microarray data) package 85 . To identify differential expression in the BALF dataset, the comparability of relative expression changes between alternative peptides was investigated by considering signal log-ratios by a two-sample t-test with a p-value ≤ 0.05 corrected for multiple hypotheses testing. For the intracellular proteins, given the substantially higher number of proteins detected, we used a conservative permutation test to decrease the possibility of type 1 error rate with an unadjusted significance level p ≤ 0.05 corrected by the Benjamini-Hochberg method for testing multiple hypotheses.
To gain insight into the biological significance of differentially expressed proteins, we performed functional analysis using Ingenuity Pathway Analysis [IPA (IPA QIAGEN, Redwood City https ://www.quiag en.com/ingen uty)]. This analysis was performed on proteins with an FDR corrected p-value ≤ 0.05 as the cutoff for differential expression for both BAL cell and fluid datasets. The IPA core analysis was performed using the difference of the weighted log fold change between comparison groups. We focused on canonical pathways that met a Benjamini and Hochberg (B-H)-corrected p-value obtained using the right-tailed Fisher exact test of ≤ 0.05 (equivalent to −log [B-H p-value] ≥ 1.3), as done previously 81 . We also examined on Overlapping Canonical Pathways functionality in IPA which is designed to visualize the shared biology in pathways through the common features (genes/proteins) in the pathways. The network of overlapping pathways shows each canonical pathway meeting the statistical threshold of −log (B-H p-value) ≥ 1.3 as a single "node". An edge connects any two pathways when there are at least two common proteins shared between the pathways.