Gene network approach reveals co-expression patterns in nasal and bronchial epithelium

Nasal gene expression profiling is a new approach to investigate the airway epithelium as a biomarker to study the activity and treatment responses of obstructive pulmonary diseases. We investigated to what extent gene expression profiling of nasal brushings is similar to that of bronchial brushings. We performed genome wide gene expression profiling on matched nasal and bronchial epithelial brushes from 77 respiratory healthy individuals. To investigate differences and similarities among regulatory modules, network analysis was performed on correlated, differentially expressed and smoking-related genes using Gaussian Graphical Models. Between nasal and bronchial brushes, 619 genes were correlated and 1692 genes were differentially expressed (false discovery rate <0.05, |Fold-change|>2). Network analysis of correlated genes showed pro-inflammatory pathways to be similar between the two locations. Focusing on smoking-related genes, cytochrome-P450 pathway related genes were found to be similar, supporting the concept of a detoxifying response to tobacco exposure throughout the airways. In contrast, cilia-related pathways were decreased in nasal compared to bronchial brushes when focusing on differentially expressed genes. Collectively, while there are substantial differences in gene expression between nasal and bronchial brushes, we also found similarities, especially in the response to the external factors such as smoking.

www.nature.com/scientificreports www.nature.com/scientificreports/ to assess the disease activity in COPD. However, bronchoscopy is an invasive procedure with a substantial burden to the patient, preventing its use in large populations as well as frequent sampling.
We previously demonstrated that the nasal epithelium can be used to detect changes in COPD-associated gene expression and showed that the nasal epithelial COPD related gene expression signature partly overlaps with COPD-associated bronchial epithelial gene expression 9 . This strengthens the hypothesis that the upper and lower airways have a common COPD-related gene expression profile, the united 'airway field of injury' . Previously, we and others have shown that nasal and bronchial epithelium have a similar expression profile in smokers and never-smokers. Additionally, we have shown that expression quantitative trait loci (eQTL) are similar between the two compartments 10 . The easy availability to nasal epithelium, nasal gene expression provides the opportunity to assess disease-related phenotypes and determine treatment outcome.
One way to investigate this is the use of gene network modelling. In particular, Gaussian Graphical Models (GGMs), are a widely used network model to study protein 11 , and gene regulatory networks 12 . The GGM consists of nodes (e.g. genes, proteins or metabolites) interconnected by edges if their partial correlation is significantly different from zero. Partial correlations are correlations where the cofounding effects are removed. This is an advantage of GGMs compared to other models such as Relevance Networks 13 or weighted gene co-expression network analysis (WGCNA) 14 , principal component analysis or clustering, which use the structure obtained from  Table 1. Clinical characteristics of the NORM study. BMI, body mass index; FEV 1 , forced expiratory volume in one second; FEV 1 /FVC, forced expiratory volume in one second/forced vital capacity; RV, residual volume; TLC, total lung capacity; RV/TLC, residual volume/total lung capacity. The means and standard deviations are shown for continuous variables. ****Independent T-test showed significant difference between the two groups only with a p < 0.0001.  Pearson correlation to find similarity patterns because spurious associations are avoided. For example, a common regulator gene might results in similar expression patterns of the regulated genes. These patterns are then indirect and cannot be distinguished from the effect of the regulator. Learning the structure of a GGM (i.e. inferring the edges from data) is computationally feasible even for large networks, and often performs as good as other more demanding network models (e.g. Bayesian Networks) 15 . Some applications in respiratory research include GGMs reconstructed from expression data of asthmatic children 16 , asthma integrated genomic data 17 , COPD phenotypic networks 18 , and asthma gene -single-nucleotide polymorphism (SNP) associations 19 .
In the current study, we aim to investigate to what extent gene expression profiling of nasal brushings are similar to that of bronchial brushings, and to determine whether nasal brushing can be used as a non-invasive biomarker of the lower airways in the study of respiratory diseases. We used GGMs to investigate correlated and differentially expressed genes between the two tissues, and to identify which pathways are similar and which are different.
Results patient characteristics. From the 110 respiratory healthy participants who were enrolled in the study, 77 had matched nasal and bronchial samples. Table 1 shows the clinical characteristics of all participants and the comparison of current and never smokers.
Genes similar between the nose and the bronchus. Spearman correlation analyses, comparing the expression of individual genes between nasal and bronchial brushes, identified 619 genes (3.4% of total, Benjamini-Hochberg (BH) 20 -adjusted p < 0.05) that were significantly correlated between bronchial and nasal samples. Table 2 shows the top 20 genes with correlated expression between the nasal and the bronchial epithelium. As expected, X-and Y-linked genes, smoking-related genes, and genes known to have high genetic contribution to variation in expression, such as HLA-DRB1, were the most significantly correlated genes in this analysis 21 . For the majority of these genes (98%) expression was positively correlated between nasal and bronchial brushes (Fig. 1A), indicating a subset of genes with strong concordance of their expression between the two sampling locations. To confirm that these findings were not by chance, a permutation analysis was conducted (n = 500). Indeed, the number of positively correlated genes was always found to be greater with paired samples compared to randomly picked (non-paired) samples (p < 0.002). Furthermore, we found that genes correlated between the two locations have both higher mean expression and greater variation (standard deviation) than non-correlated genes (Fig. 1B,C), indicating that variation of gene expression is required for correlation. We next investigated to what extent the nasal sample reflects an individual's bronchial transcriptional profile rather than a response to environmental insults such as smoking. To this end, we assessed whether the nasal-bronchial relation within a patient was stronger than across patients. We performed this analysis for all genes genome-wide and for the list of 619 correlated genes mentioned above. We found no difference with respect to the nasal-bronchial relation between samples from the nose and bronchus when looking at all genes, while for the 619 correlated genes (CO), the intra-patient nasal bronchial correlations were more correlated than across patients ( Fig. 2A-C). This may be explained by the low expression or low variation in expression in the population of the non-correlating genes. These two factors drive the lack of self-correlation as the levels of expression between nasal and bronchial brushes are very similar across all patients. network analysis on correlated genes. From these 619 CO genes, we built two GGM networks (one for each tissue) at BH-adjusted p ≤ 0.01 (Fig. 3A). We found 163 genes (26.33%) that are connected in the bronchial network with 156 edges (i.e. significant partial correlations), and 168 (27.14%) in nasal network with 152 edges. In total 236 genes (38.12%) had at least one edge in one of the tissues, from which only 36 genes (15.25%) have common edges in both compartments. Figure 3B shows an (edge-wise) comparison of the networks via a scatter plot of BH p-values for the CO genes. We observed that genes belonging to the same family tend to be interconnected in both tissues. In particular, HLA-DQB1, HLA-DRB1, HLA-DRB5 and HLA-DQA1 were interconnected to each other. Other gene pairs from the same family are connected as well, namely; Table 3 shows the results of GO enrichment analysis for biological network analysis on smoking-related genes. Previously, we identified 27 genes that were differentially expressed in current smokers compared to never smokers in both the nasal and bronchial brushes 22 . From these 27 smoking-related genes (SM), we inferred two GGM networks (one for each tissue) at BH-adjusted ≤0.05 (Fig. 4A). We found 8 genes (29.62%) that are connected in the bronchial network with 4 edges (i.e. having multiple significant partial correlations), and 6 genes (22.22%) in nasal network with 3 edges. A total of 6 genes (75.00%, 3 gene pairs) exhibited one common edge between the bronchial and nasal network. Figure 4B shows an (edge-wise) comparison of the networks via a scatter plot of BH p-values for the SM genes. We observed p 0 01). Genes that had no significant connections (partial correlation) were left out of the figure, and the genes connected in only one of the tissues are colored in grey. In blue: edges present in bronchial tissue. In red: edges present in nasal tissue. In magenta: edges present both tissues (Overlapped edges). (B) Scatter plot of the GGM network edges (partial correlation) for the CO genes. Each red dot represents −log p (BH ) 10 of an edge in bronchial tissue (horizontal axis) versus nasal tissue (vertical axis). In light black: the critical value at BH ≤ . . p 0 01 The figure displays the respective gene pairs for the most similar edges. (C) The 50 most significant GOs for the set of Overlapped genes. The enrichment is contrasted against two sets of genes randomly sampled from the CO (619 genes) and from the whole genome (19718 genes). The panel displays the corresponding mean −log 10 (p-values) and error bars represent + 2 standard errors over the 500 random samples.  Table 4 shows the results of GO enrichment analysis for biological processes at FDR ≤ 0.05.
Transcription profiles differ between the nose and the bronchus. Next, we investigated which genes drive the differences between the locations. We identified 6,806 expressed in nasal and 6,797 genes expressed in bronchial epithelium (log 2 (microarray florescence) <3), with an overlap of 98.2%. Expression of 130 genes was specific to bronchial epithelium, while 145 genes were specifically expressed in nasal epithelium. A differential expression analysis identified 1692 differential expressed genes (DEGs), among which 723 (3.98%) DEGs with higher and 969 (5.34%) with lower expression (|log fold change|>2, FDR < 0.05) in nasal compared to bronchial brushes. Table 5 shows the 20 most DEGs. Figure 5A shows the DEGs highlighted in red (higher in bronchial brushes) and blue (lower in bronchial brushes) in a scatter plot showing the mean expression in the bronchial and nasal brushes. Figure 5B shows a heatmap of the DEGs. network analysis on DeGs. From the 1692 DEGs, we built two GGM networks (one for each tissue) at BH-adjusted p ≤ 0.01. We found that 821 genes (48.52%) are connected in the bronchial network with 2642 edges (i.e. significant partial correlations), and 946 (55.91%) in nasal network with 5291 edges. We found that 1136 genes (67.13%) have at least one significant edge in one of the tissues, from which 179 genes (15.76%) show common edges, and the remaining 957 genes (84.24%) do not exhibit common edges. Figure 6A shows an (edge-wise) comparison of the networks for each tissue via a scatter plot of BH p-values for the DEGs. This set of 957 genes will be referred to as the genes with non-overlapped edges. The gene ontology (GO) enrichment analysis identified 160 biological processes mainly related to cilium (e.g. cilium part, organization and assembly). The 50 most significant GOs are displayed in Fig. 6B. Table 6 shows the GO enrichment for biological processes at ≤ . FDR 0 05 for the non-overlapped genes.
Cell type decomposition and gene set variation analysis (GSVA) analysis. To investigate the difference in cell-type composition between bronchial and nasal epithelium, we employed markers identified in recent single-cell profiling of human lungs 23 . Interestingly, bronchial epithelium shows higher expression of genes that mark Ciliated and Club cells, while nasal tissue exhibits higher expression of genes characteristic for Goblet cells (Fig. 7A). Within tissue the expression patterns of marker genes is similar between smoker and never smokers where apparent difference can be observed only at the level of individual genes (e.g. MUC5B or CEACAM5). In addition to determine differential pathways expression we have performed GSVA of both tissue type using gene sets attributed to Goblet, and Ciliated cells (Fig. 7B).

Discussion
In the current study, we used a network-based approach to identify pathways that explain the differences and similarities between the gene expression profiles from nasal and bronchial brushes. This is the first study using GGMs to compare gene networks from the expression data obtained from nasal and bronchial samples. We show that genes involved in inflammatory pathways are retained between the two compartments, while cilia-related genes are lower expressed in the nose. The detoxifying gene expression response of the respiratory tract to cigarette smoking is similar in the nose and bronchus.
When investigating pathways from genes with similar expression in the nose and bronchus, we found genes involved in pro-inflammatory pathways, including the HLA-family, CFD, CD207 and MT2A. The HLA class II molecules have a key function in the adaptive immune response by providing peptides to the antigen receptor of CD4 +T lymphocytes. Several studies have found an association between Human Leukocyte Antigen (HLA) class II genes and asthma, as well as with other allergic diseases, such as allergic rhinitis and atopic dermatitis 22 . Previous studies have shown similar eosinophilic infiltration (a hallmark of inflammation in asthma) in the nose and bronchus of healthy controls and asthmatics on corticosteroid therapy. Interestingly, although increased inflammation in untreated asthmatics compared to healthy controls was present in both compartments, the inflammation was further enhanced in the bronchus. Furthermore, in vitro studies have shown high  www.nature.com/scientificreports www.nature.com/scientificreports/ concordance in inflammatory responses to the pro-inflammatory stimuli IL-1β, TNF-α and rhinovirus in both nasal and bronchial epithelium 24,25 . The strong concordance in gene expression networks for genes involved in pro-inflammatory pathways (between samples collected in the nose and the bronchus) indicates that the nasal epithelium could be used to study underlying molecular mechanisms of inflammation and to identify easily accessible biomarkers for chronic inflammatory disease classification.   Furthermore, when looking at pathways concordantly affected by smoking between the nose and the bronchus, genes providing proteins with oxidoreductase activity such as the cytochrome P450 genes (e.g. CYP1A1, CYP1B1) and ALDH1A3, had similar expression between the nose and the bronchus. Previously, it has been shown that genes induced by smoking and having oxidoreductase activity are one of the most rapidly reversible genes upon smoking cessation 8 . Our findings further support the idea of a detoxifying response to tobacco exposure throughout the airways in smokers. Previous studies focusing on ex vivo nasal and bronchial biopsies have shown similar response to cigarette smoke stimulation between the two tissues 26 . Moreover, current and former smokers' bronchial epithelial gene expression has been used to derive and validate a biomarker to detect lung cancer 27 . Subsequently, it was found that lung cancer-associated gene expression was detectable from nasal   www.nature.com/scientificreports www.nature.com/scientificreports/ epithelium, with both tissue showing concordant expression alterations (e.g. regulation of apoptosis and immune system signaling) 28 .
Another gene identified in our analysis is TFF1, which encodes for the trefoil factor family peptides. These peptides are secreted by mucus-producing cells into mucosal surfaces throughout the body and can bind to mucin molecules. It has been shown in mouse studies that TFF1 expression is increased after injury of the airway epithelium, suggesting a role in airway epithelial repair 29 . Another interesting gene is CTNNAL1, a gene that has been associated with cellular growth regulation and may be involved in the recovery of (bronchial) epithelial damage 30 .
A number of genes were found to have different expression levels at baseline in nasal and bronchial airway epithelium. However, we still found an overlap between the two networks, indicating that regardless of the different baseline levels of individual genes, there may still be correlations for the gene-sets as a whole between the two compartments. Similar findings have been observed in in vitro models, showing different levels of certain genes between nasal and bronchial epithelial cultures, and nevertheless, these genes still correlated in their response to stimuli 25 . Previous studies focusing on ciliary beat frequency found that this was similar in nasal and bronchial epithelium 31 . However, pathway analysis of differentially expressed genes between nose and bronchus in our study indicated cilia-related pathways to be decreased in nasal compared to bronchial brushes. In line with this observation, it has been shown that the percentage of ciliated cells increases further down the respiratory tract as shown by a previous study 32 . This change in percentage will lead to a shift in cellular composition and an increase of cilia-related gene expression in the lower airways. This finding was confirmed by our GSVA were nasal epithelium showed lower expression for ciliated genes compared to bronchial epithelium. These results support also the high quality of our nasal and bronchial epithelium dataset.
The strength of our study is the use of matched nasal and bronchial samples from a moderately powered dataset. There are also a number of limitations associated with this study. First, from the network analysis perspective GGMs are limited to linear associations (partial correlation). Therefore, if a pair of genes shows a nonlinear relationship, the GGM might not identify these associations. Second, network inference is a multiple testing scenario, i.e. p genes imply performing p(p-1)/2 tests. Here, we have employed BH correction for multiple testing, however, it should be kept in mind that false positives (false edges) might be still present in the network structure regardless of the method depending on the chosen FDR error tolerance. Third, the small number of genes used for the enrichment analysis may have influenced the pathways we found. A subset of the genes with similar gene expression patterns between nasal and bronchial brushing identified in our current analysis are likely due to genetic polymorphisms that have effects on gene expression in general through the whole body not exclusively related to the tissues measured in the current manuscript.
In conclusion, we have shown that although there are differences in expression between the nasal and bronchial brushes, their response to external factors such as smoking seem to be concordant. Therefore, we suggest that the use of nasal brushes as a proxy for the bronchus is suitable to study airway epithelium at baseline and in response to environmental exposures.
To identify concordant genes where variation among subjects was correlated between sampling locations, we applied a Spearman correlation comparing the expression of individual genes between nasal and bronchial brushes. To assess whether the nasal-bronchial relation within a patient was stronger than across patients, we applied a Spearman correlation on all paired samples and compared this to the average correlation on mismatched pairs. This analysis was conducted on all genes and on the correlated genes.
In order to identify genes differentially expressed between nasal and bronchial brushes, we performed a paired analysis, using a linear model correcting for age, gender and smoking status. Genes were considered significantly differentially expressed if their BH corrected p-value was <0.05 and if they had a |log fold change| > 2. network analysis. Three different sets of genes were studied: (i) the set of CO, (ii) twenty seven smoking related genes previously found to be differentially expressed in both the nose and the bronchus 22 (SM), and (iii) the set of DEGs. The expression data has been log 2 -tranformed and standardized (mean = 0, sd = 1) such that the data is normal distributed.
As our dataset involves a number of genes that is larger than the sample size, the reconstruction of a GGM (i.e. inferring the partial correlations) from expression data requires a shrinkage approach. This is usually known as a high-dimensional problem. To this end, the network analysis was performed using the R package GeneNet version 1.2.13 12 in R version 3.4.3. The test of significance for each of the GGM's edges (i.e. the partial correlations) was performed with the improved method that we have published recently in Oxford Bioinformatics 33 . This method test the null-hypothesis of no partial correlation between pairs of genes, and has an accurate control of the false Figure 7. Cell type-specific marker analysis between bronchial and nasal epithelium. In panels of this figure, we investigate the difference in cell-type composition. Panel (A) shows the expression profile of genes identified in recent single-cell profiling of human lungs 23 in a form of heatmap stratified by tissue type and smoking status. Bronchial epithelium shows higher expression of genes that mark Ciliated and Club cells, while nasal tissue exhibits higher expression of genes characteristic for Goblet cells. Within tissue expression patterns of marker genes is similar between smoker and never smokers where apparent difference can be observed only at the level of individual genes (e.g. MUC5B or CEACAM5). Panel (B) GSVA of ciliated and goblet cell genes from match nasal and bronchial brushes. The scatter plot shows the overall lower GSVA scores of different gene sets attributed to Ciliated cells in nasal samples and the overall higher GSVA scores of gene sets corresponding to Goblet cells.