Integrative analysis to explore the biological association between environmental skin diseases and ambient particulate matter

Although numerous experimental studies have suggested a significant association between ambient particulate matter (PM) and respiratory damage, the etiological relationship between ambient PM and environmental skin diseases is not clearly understood. Here, we aimed to explore the association between PM and skin diseases through biological big data analysis. Differential gene expression profiles associated with PM and environmental skin diseases were retrieved from public genome databases. The co-expression among them was analyzed using a text-mining-based network analysis software. Activation/inhibition patterns from RNA-sequencing data performed with PM2.5-treated normal human epidermal keratinocytes (NHEK) were overlapped to select key regulators of the analyzed pathways. We explored the adverse effects of PM on the skin and attempted to elucidate their relationships using public genome data. We found that changes in upstream regulators and inflammatory signaling networks mediated by MMP-1, MMP-9, PLAU, S100A9, IL-6, and S100A8 were predicted as the key pathways underlying PM-induced skin diseases. Our integrative approach using a literature-based co-expression analysis and experimental validation not only improves the reliability of prediction but also provides assistance to clarify underlying mechanisms of ambient PM-induced dermal toxicity that can be applied to screen the relationship between other chemicals and adverse effects.

Pathway analysis of publicly available data to explore altered biological functions and cellular processes related to the common properties of PM and environmental skin diseases. To explore the biological association between environmental skin diseases and PM, the biological networks among intersection genes were analyzed. Among 70 identified genes, 32 genes established significant signaling networks. To investigate the cellular functions or processes altered by PM-induced signaling networks in environmental skin diseases, we expanded the analysis of the genes to include their functional classes and the cell processes involved (Fig. 2). "Functional class" categorizes the genes by their biological functions listed in the database of Pathway Studio. Based on the text-mining algorithm of the software, this analysis enables the prediction of the cellular functions that may be altered in PM-induced skin diseases. The predicted genes in the pathway belong to classes related to inflammatory functions, including "inflammatory cytokine, " "IL-1 family, " and "NF-κB family" (Fig. 2a). In terms of the "cell processes, " the networks among the genes were closely related Table 1. Keywords used for retrieving the public genomic data relevant to PM and skin diseases.

Topic
Keyword used for searching the DEG data Number of searched genes Reference (or GEO accession number)
Identification of the differentially expressed genes (DEG) in PM 2.5 -exposed skin cells through RNA-seq and pathway analyses to seek the altered signaling networks. To verify our public data-based prediction, we obtained transcriptomic profile data from skin cell-based experiments by performing RNA-seq using normal human epidermal keratinocytes (NHEK) that had been exposed to PM 2.5 . In total, Figure 1. Principle of data crawling from public genome databases and next-generation sequencing (NGS)based analysis. Keywords for crawling genome data were selected based on characteristics of PM and names of environmental skin diseases. Criteria for differential gene expression were p < 0.05 and |fold-change| > 2. Groups of retrieved genes were compared, and intersections were identified. DEG data and RNA samples from NGS experiments were used for further analysis to select key regulators of the PM-induced pathway and for validation.   www.nature.com/scientificreports/ 122 genes were downregulated, and 148 genes were upregulated based on a |fold-change| > 2 and p-value < 0.05. With these 271 DEG, pathway analysis was conducted to determine the biological alterations in human skin cells under PM exposure. Finally, 30 genes were used to construct the significant signaling networks, and the analysis of cellular functions or processes altered by PM exposure was expanded among the genes in terms of their functional classes and involved cell processes. "NF-κB family, " "cytokine, " "Jun/Fos, " and "inflammatory cytokine" were predicted as the major functional classes that may be altered by differential gene expression upon PM exposure (Fig. 3a). Cell processes involved with basic cell function (cell proliferation, cell differentiation, and cell motility), immune response (inflammatory response, T-cell activation, neutrophil migration/recruitment, and leukocyte recruitment), oxidative stress, and ROS generation were predicted (Fig. 3b). IL-1B, MMP-9, CXCL-8, CSF-2, IL-1A, HMOX-1, MMP-1, and S100A9 were predicted as key hub regulators of the pathway based on their betweenness and degree centrality (Table 3).
Integrative analysis of literature and experiment-based results in terms of differentially expressed signals and PM component association to identify biological alterations in the skin caused by PM. We simplified the literature-based network from Fig. 2 to select the final potential key networks that elucidate the etiological relationships between PM and skin diseases (Fig. 4a). All entities were selected based on the betweenness and degree centrality with surrounding nodes in the pathway (Table 2 and  Supplementary Table S1). The top 10 genes with high degree values were selected as key genes for simplification. The relation between the selected key genes and centrality is presented in Supplementary Table S1. "Inflammatory cytokine" and "NF-κB family" were predicted as the major functional classes, and "inflammatory response" was predicted as the major cell process closely related to PM-induced skin diseases. Each key regulator was highlighted based on gene-disease associations referred to in our retrieved dataset. S100A8, S100A9, and IL-6  www.nature.com/scientificreports/ were predicted as important hubs in the pathway, with relationships between more than two environmental skin diseases.
For integrative prediction about the potential mechanism of PM-induced environmental skin diseases, we overlapped activation/inhibition patterns in fold-change values from our next-generation sequencing (NGS)based experimental data with the result of our literature-based analysis. Activation of inflammatory responses by altered NF-κB family and inflammatory cytokine functions were predicted based on the up-regulation of PLAU, MMP-9, MMP-1, S100A8, and S100A9 (Fig. 4b), which we verified by quantitative real-time PCR (qRT-PCR). All five genes showed significantly upregulated expression under PM 2.5 exposure compared to the control group (Fig. 4c). Interestingly, significant changes in mRNA expression levels of IL-6 were confirmed by qRT-PCR, and expression alteration of IL-6 at the protein level was validated by enzyme-linked immunosorbent assays (ELISA) ( Supplementary Fig. S2), although fold-change values of IL-6 were not detected in the RNA-seq experiment.
Integrative prediction of biological alterations and upstream regulators in skin exposed to PM. Based on the association obtained from the integrative analysis, we explored significant upstream regulators for the key genes using upstream analysis algorithm in Ingenuity Pathway Analysis (IPA) software (Table 4). Among the predicted regulators, TNF, NF-κB, and ERK1/2 formed significant networks with the six key genes.
Final key networks were summarized in Fig. 5a, which showed biological alteration in skin regulated by PMinduced functional activation of TNF, NF-κB, and ERK1/2. Protein level change of upstream regulators were verified using Western blot or ELISA assay, it was found that the expression patterns were significantly changed, as predicted in the upstream analysis (Fig. 5b,c, and Table 4).

Discussion
In recent years, biological evidence supporting the role of PM in skin damage has been suggested via various approaches. Numerous epidemiological studies have reported a phenomenological association between the increase in airborne PM level and diagnosis frequency of skin diseases. Although characteristic variables of PM depend on weather or location, studies have commonly reported associations between ambient PM and progression of inflammatory skin symptoms or diseases, such as eczema, allergic contact dermatitis, and atopic dermatitis 14,16,18,19 . However, the underlying mechanism for these associations was not fully understood. The size and component characteristics of PM are closely related to the source of the PM 8,20 . PM released directly from natural sources (e.g., crustal movement, dust storms, forest fires, and weathering of geographical features) and micro-sized biological particles (e.g., bacteria endotoxins, pollen, and spores) are classified as primary particles (mostly PM 10 ) 21 . Anthropogenic sources (e.g., solid fuel combustion, attrition of brakes and tires on urban roads, and erosion during manufactural processes) are major causes of micro-sized solid chemical particles and gas or liquid particles in urban PM 22,23 . Most of the PM that originates from anthropogenic sources is derived from chemical reactions between oxides of sulfur and nitrogen, VOC, PAH, and other chemical derivatives of primary particles. Being mostly smaller than primary particles, this PM belongs to PM 2.5 . All PM types consist of organic carbon, elemental carbon, PAH, VOC, and metals 24,25 . The individual toxicity of each component has been widely studied; inhaled or penetrated ambient PM-sized heavy metals can accumulate in the body and stimulate chronic illnesses, including bronchial damage, lung malfunction, or skin carcinogenesis 26,27 .
Organic components, such as PAH and their oxygenated derivatives, cause severe oxidative stress and mitochondrial damage 28,29 . However, in terms of heterogeneous particles, the contribution of each component to the PM-induced adverse effects is not fully understood. Here, we analyzed the contribution of chemical components in the biological pathway of PM-induced skin diseases (Supplement Fig. S3). Identified genes in the pathway were sorted and marked based on knowledge of chemical-gene associations from our retrieved dataset in Table 1. PAHs and cadmium are the largest contributors to the pathway. MMP1, PLAU, SERPINB2, ITGA2, and CFB are commonly associated with the largest number of PM components and are closely related closely to 'psoriasis, ' 'melanoma, ' 'dermatitis, ' 'wounds, ' and 'wounds and injuries. ' This suggestion has limitations that  Fig. 2). Genes highlighted in orange, violet, and green, respectively, indicate atopic dermatitis-associated, allergic contact dermatitis-associated, and eczema-associated. Descriptions of the schematic symbols are located to the left of the figure. (b) Overlap with fold-change values from our RNA-seq experiments to identify activation/ inhibition patterns. Criteria for differential expression is p < 0.05 and |fold-change| > 2. Upregulated genes are highlighted in pink. Schematic legends are located to the left of the figure. (c) Validation of mRNA expression profiles of key regulators using qRT-PCR. The upregulation of MMP9, MMP1, S100A8, S1009, and PLAU was confirmed. IL6 did not show significant fold-changes in RNA-seq, but changes in RNA expression levels were confirmed by qRT-PCR.  www.nature.com/scientificreports/ arise from inadequate keywords of PM components and the required validation for specifying components "in PM. " However, our approach provides important clues for clarifying the comprehensive toxic effects and main causes of skin disorders from PM exposure by considering comparative data on the differential contribution of the components to each mode of action. The rapid evolution of NGS has accelerated research in genomics field by providing a massive amount of data. Researchers can quickly and easily access data through various public databases and utilize genome-wide gene expression profiling for their studies. In interpreting the dynamic pattern of gene expression profiles, co-expression network analysis for exploring gene functions and gene-disease associations have emerged as respective data-analysis methods 17 . In the present study, we aimed to integrate previous knowledge from literature-based data with our experimental data to identify the relationship between PM exposure and skin diseases. To interpret/ expand the biological meaning from the list of genes associated with the chosen keywords (Table 1), we applied text-mining software-based pathway analysis among the identified genes to obtain their co-expression network, related functional processes, and disease-related phenotypes. During the construction of the pathway, each gene was referred to as an "entity" and linked to another entity by "relation, " which represents the biological relationship between two entities. A relation between two entities can be established by screening the sentences in massive volumes of scientific literature based on co-occurrence frequency in the same scientific publications 30 . The importance of each entity was determined by centrality concepts. Betweenness centrality quantifies the shortest path between adjacent entities, and degree centrality defines the number of relations between adjacent entities 31 . Higher values of these two parameters in the pathway explain the crucial point of interaction among multiple biological networks. We considered both values to cover connectivity with surrounding entities and the role of the hub genes in the predicted pathway, which shows the etiological relationship between PM and skin diseases.
From the experimental perspective, several in vitro studies have elucidated that PM-induced cytotoxicity may derive from activation of the IL-1β, IL-6, and NF-κB signaling pathways in keratinocytes [32][33][34] . However, cell line-based studies have critical weak points in that they do not account for the systemic response in the skin. Thus, recent publications have attempted to demonstrate PM-induced skin damage with consideration of the comprehensive profile of macrophenomena aspects and molecular mechanisms using both in vitro and in vivo models or artificial three-dimensional skin tissue models. Such studies have suggested a significant alteration of several specific inflammatory markers, such as IL-1α, IL-8, or oxidative stress-induced NF-κB nucleus translocation 7,35,36 . With this consideration, we attempted to identify the comprehensive biological responses associated with PM-induced skin toxicity, as well as marker-specific knowledge, by interpreting the interactions among the entities and surrounding molecular networks, as shown in Figs. 2 and 3. From the public database (Fig. 2a) and experimental data (Fig. 3a), the NF-κB family and cytokines were commonly predicted as possibly altered cellular functions in the skin under PM exposure (Fig. 4a). In addition, inflammatory response-related cell processes mediated by MMP-9, S100A8, S100A9, and IL-6 were commonly predicted (Figs. 2b and 3b). These results reflect the previously mentioned data from various in vivo and/or in vitro approaches to demonstrate dermal toxicity of PM and provide improved information at the pathway level. All genes and their target proteins play individual and collective roles by interacting with each other. Our results suggest that PM exposure causes an inflammatory response mediated by alteration of the NF-κB family and cytokine functions through differential expression of MMP-9, S100A8, S100A9, and IL-6.
The inflammatory response is a complex and rapid biological process induced by extrinsic irritants 37 . The primary purpose of the inflammatory response is to defend the system against injurious stimuli 38 . Exposure to a toxicant or pathogen can cause a response that leads to tissue-level pathological conditions because of uncontrolled, excessive activation of the immune system. Inflammatory and immune responses induced by airborne toxicants are a major inducer of drastic adverse effects in skin cells during penetration of extrinsic irritants through skin. IL-6, S100A8, and S100A9 are three key regulators that display a positive biological relationship with atopic dermatitis (Fig. 4a) and serve as major inflammatory markers under PM exposure. IL-6 is a wellknown cellular stress and pro-inflammatory marker that is overexpressed after exposure to various extrinsic harmful substances 39 . It is also actively studied as a marker for skin diseases, such as atopic dermatitis and allergic dermatitis 40 . S100A8 and S100A9 belong to the S100 protein family and are released to the acellular compartment, where they bind cell surface receptors and could act as major regulators of the inflammatory response 41 . One of those receptors is Toll-like receptor 4, TLR-4 42 . Upon binding to TLR-4, signaling cascades are initiated that regulate inflammation and NF-κB-dependent tumor development [42][43][44] .
The activation pattern of the summarized pathway involved in PM-induced skin diseases was predicted using fold-change values from RNA-seq data analysis of human epidermal keratinocytes. Owing to the essential roles of keratinocytes in immune responses to exposure and penetration of extrinsic factors 45 , the epidermal keratinocyte model is widely used to determine the detrimental effects of PM 33,46 . This experimental procedure allowed validation of the relevance between PM exposure and the changes in the levels of key regulators in skin cells, as predicted from the integrative pathway analysis. The up-regulation of matrix metalloproteins (MMP) plays an important role in the skin and modulation of inflammation. MMP-1 intermediates cleave fibrillar collagens and contribute to collagen degradation and extracellular matrix remodeling in keratinocytes 47,48 . MMP-9 also cleaves extracellular matrix components and causes skin inflammation via activation of cytokines, including IL-1β and IL-13 49 . The direct association between upregulated PLAU and the skin or PM has not been fully understood, but it is closely related to MMP-1 and MMP-9. Upregulated urokinase-type plasminogen activator-the enzyme encoded by the PLAU gene-causes plasmin-dependent activation of MMP via catalysis-mediated conversion of plasminogen to plasmin 50 . This knowledge revealed predictable roles for additional key genes in PM-induced skin diseases. Based on the experimental validation using our NHEK sample (Fig. 4c), not only are the key regulators differentially expressed in experimental data, but another gene indicated changes in expression levels not included in the DEG data. In this regard, the integrative approach for screening/exploring the potential www.nature.com/scientificreports/ association between environmental factors and diseases was informative in terms of important aspects that researchers might have missed if only the public or experimental data had been considered.
Here, we explored the adverse effects of PM on the skin and attempted to elucidate their relationships using public genome data. Through the literature-based biological pathway analysis of gene expression data from a public database, we identified the chemical-gene-disease associations of PM-induced environmental skin diseases. Through upstream analysis and validation, changes in biological functions and cellular processes of cytokines elicited by the inflammatory response were predicted as the major contributors of adverse outcomes, and expression level changes of key regulators in the pathway were validated (Fig. 5b,c). Further mechanism studies will be required to demonstrate the exact molecular interaction. However, our results provide evidence to assist in clarifying the underlying mechanism of ambient PM-induced dermal toxicity and exemplify the unconventional approach to screening the biological relationships between chemicals and diseases.

Materials and methods
Collection of global gene expression profiles from public databases. We proceeded with keyword selection to retrieve chemical-gene-disease associations from public databases. The genomic data were collected in three categories: "PM size, " "PM components, " and "Skin disease. " First, cadmium, lead, PAH, and VOC were selected as the major chemical components of PM in view of their frequent or common mention in numerous papers to eliminate possible variation in chemical composition because of geographical or time-based factors [51][52][53][54] . "Coal Ash" was also added to expand the informal definition of PM provided in the Comparative Toxicogenomics Database (CTD). DEG were collected from the CTD (http:// ctdba se. org/, last access date: September 2019), a literature-based public resource that provides curated information about chemical-gene/protein interactions and chemical-gene-disease relationships 55 . Second, PM size was integrated into the analysis by directly searching the keywords "PM 10 " and "PM 2.5 . " Gene expression data concerning PM size were obtained from research publications in the PubMed database, and the collection period was from 2018 to 2019. Finally, "atopic dermatitis, " "allergic dermatitis, " and "eczema" were selected as keywords for collecting gene expression data involving environmental skin diseases. The Gene Expression Omnibus (GEO) (https:// www. ncbi. nlm. nih. gov/ geo/, last access date: October 2019), a public functional genomics data repository, was used for collecting the disease-specific gene set analyzed from human-based research publications. All data in the dataset were trimmed based on p-value < 0.05, and |fold-change| > 2. The workflow scheme of the data crawling is provided in Fig. 1.
Chemical preparation and cell treatment. Ambient PM 2.5 was collected on a polytetrafluoroethylene (PTFE) filter (Zefluor™, Pall Life Sciences, Mexico City, Mexico) with a high-volume sampler machine (TE6070, Tisch Environmental, Inc., Cleves, OH, USA), equipped with a PM 2.5 selective-inlet head at a flow rate of 1.13 m 3 /min. The sample collection of PM 2.5 was carried out on the rooftop of the Amorepacific Corporation R&D building, located in Yongin, Korea (37°15′N, 127°06′E). PM 2.5 was extracted with ethanol (EtOH) in a sonicator for 30 min. The obtained extract was dried using an evaporator and resuspended with 20% EtOH. NHEK from neonatal foreskin (Lonza, Walkersville, MD, USA) were cultured in keratinocyte growth medium (KBM-GOLD) with SingleQuots™ supplement (Lonza) containing hydrocortisone, transferrin, epinephrine, gentamicin/amphotericin B, bovine pituitary extract, human epidermal growth factor, and insulin. NHEK were starved for 24 h in KBM-GOLD medium with gentamicin/amphotericin B, followed by stimulation with PM 2.5 (100 μg/mL, 95% cell viability) for 24 h. All experiments were performed with cells passaged less than three times.

RNA-seq and identification of DEG.
Transcriptome analysis using RNA-seq was performed by Macrogen, Inc. (Seoul, Korea). RNA was extracted using an RNeasy mini kit (Qiagen, Hilden, Germany) following the manufacturer's instructions. The quality of RNA samples was checked using the Agilent 2100 Bioanalyzer (Agilent Technologies, Inc., Santa Clara, CA, USA). Libraries were generated with the TruSeq Stranded mRNA Prep Kit (Part #15031047 Rev. E). Purified mRNA was fragmented, and pair-end RNA-seq was conducted using a HiSeq 2500 (Illumina, San Diego, CA, USA) sequencing system. TruSeq Stranded mRNA LT Sample Prep kits (Illumina) were used to establish libraries according to the sample preparation guide. To determine the RNA expression profiles, the RNA-seq reads were mapped to a human reference genome (hg19) using HISAT2 56 . Human reference genome sequence and annotation data were downloaded from the University of California, Santa Cruz (UCSC) Genome Browser (http:// genome. uscs. edu). Mapped reads were assembled using StringTie 57 . Transcript counts were calculated at the isoform and gene levels, and the relative transcript abundances were measured in Fragments Per Kilobase of exon per Million fragments mapped (FPKM). P-value < 0.05 and |fold-change| ≥ 2 were considered as criteria for DEG. All RNA-seq datasets used are available at the GEO repository: GSE143709. RNA samples and identified DEG were used in further analyses, as illustrated in Fig. 1.
Biological network analysis using retrieved gene set and overlapping with experimental data. Pathway Studio web 12.1.0.9 (Elsevier), a literature-based software, was used to analyze biological networks among the identified gene set. Pathway Studio contains a curated literature database based on its own textmining module, which extracts relevant sentences concerning the relationship between two entities 58 . It provides molecular interactions among the entered genes, as well as an investigation of protein-protein or protein-cell process interaction maps. The relations with less than five references were excluded for analysis. The description of each relation type between entities is as follows: Binding: physical contact between two molecules; DirectRegulation: influences target activity by direct physical interaction; Expression: regulator changes protein abundance by affecting levels of transcript or protein stability; ProtModification: regulator that changes the modification of www.nature.com/scientificreports/ the target molecule; PromoterBinding: regulator that binds to the promoter of a gene 30 . The importance of the entities in the pathway was analyzed according to their betweenness and degree centrality, which were calculated using NetworkAnalyzer in Cytoscape 3.7.2 software. IPA (Qiagen) was utilized to perform upstream analysis. IPA is a software application that provides comprehensive biological knowledge and predictions relevant to entered gene expression data based on their curated database via a data mining interface 59 . The upstream analysis function enables the prediction of the upstream transcriptional regulators linked to observed gene expression changes in the signaling networks.
qRT-PCR. cDNA was synthesized using the ImProm-II™ Reverse Transcription System (Promega, Madison, WI, USA) from 500 μg of extracted RNA following the manufacturer's instructions. qRT-PCR was conducted in the Rotor-Gene Q Real-Time PCR system (Qiagen) using Takara SYBR Premix Ex Taq (Takara Bio, Inc., Japan). Thermal cycling conditions for PCR included an initial denaturation step at 95 °C for 5 min, followed by 40 cycles of denaturation at 95 °C for 30 s, annealing at 55 °C for 30 s, and extension at 72 °C for 30 s. Melting curve analysis of the PCR products was performed at the end of the PCR step, and the data were analyzed using the Rotor-Gene Q Real-Time PCR system (Qiagen). Glyceraldehyde-3-phosphate dehydrogenase-encoding gene (GAPDH) was used to normalize the relative level of gene expression using the 2 −ΔΔCT method. The sequences of the primers used for qRT-PCR are shown in Supplementary Table S2.
Western blot and ELISA assay. Total proteins of NHEK cells were extracted with RIPA buffer containing 1 mM DTT and 1 U of EDTA-free protease inhibitor cocktail (Roche, Manheim, Germany). An equal amount of protein was separated on 7.5% SDS-PAGE and then transferred onto polyvinylidene fluoride membranes. The membranes were blocked using 5% skim milk in TBS at 25 °C for 2 h and incubated with specific primary antibodies: anti-GAPDH (2188, Cell Signaling, MA, USA), anti-NF-κB (3035, Cell Signaling), and anti-ERK1/2 (sc-514302, Santa Cruz Biotechnology, CA, USA) at 4 °C overnight. Membranes were incubated with secondary antibodies (HRP-linked IgG) at 25 °C for 1 h. The secondary antibodies were anti-mouse (A90-116P, Bethyl Laboratories, Montgomery, TX) and anti-rabbit (A120-101P, Bethyl Laboratories). Membranes were washed with TBST, and proteins were detected using enhanced chemiluminescence (ECL) Prime Western Blotting Detection Reagent (GE Healthcare, Piscataway, NJ, USA). The GAPDH band was used as a loading control. The supernatants of NHEK cells were collected, and TNF (TNF-alpha Human ELISA Kit, High Sensitivity, BMS223HS) and IL-6 (IL-6 Human ELISA Kit, High Sensitivity, BMS213HS) were measured using commercial ELISA kits (Invitrogen, CA, USA) in accordance with the manufacturer's protocols.
Statistical analysis. All data were obtained from at least three independent experiments conducted in triplicate. All graph data indicate mean ± standard error. Differences between experimental groups were evaluated using the Student's t-test, and comparisons among more than two groups were analyzed using ANOVA. P-values < 0.05 or < 0.01 were considered statistically significant.