PulmonDB: a curated lung disease gene expression database

Villaseñor-Altamirano, Ana B.; Moretto, Marco; Maldonado, Mariel; Zayas-Del Moral, Alejandra; Munguía-Reyes, Adrián; Romero, Yair; García-Sotelo, Jair. S.; Aguilar, Luis A.; Aldana-Assad, Oscar; Engelen, Kristof; Selman, Moisés; Collado-Vides, Julio; Balderas-Martínez, Yalbi I.; Medina-Rivera, Alejandra

doi:10.1038/s41598-019-56339-5

Download PDF

Article
Open access
Published: 16 January 2020

PulmonDB: a curated lung disease gene expression database

Ana B. Villaseñor-Altamirano¹,
Marco Moretto²,
Mariel Maldonado³,
Alejandra Zayas-Del Moral⁴,
Adrián Munguía-Reyes³,
Yair Romero⁵,
Jair. S. García-Sotelo¹,
Luis A. Aguilar⁶,
Oscar Aldana-Assad¹,
Kristof Engelen²,
Moisés Selman³,
Julio Collado-Vides^4,7,
Yalbi I. Balderas-Martínez^3,8 &
…
Alejandra Medina-Rivera¹

Scientific Reports volume 10, Article number: 514 (2020) Cite this article

8206 Accesses
12 Citations
16 Altmetric
Metrics details

Subjects

Abstract

Chronic Obstructive Pulmonary Disease (COPD) and Idiopathic Pulmonary Fibrosis (IPF) have contrasting clinical and pathological characteristics and interesting whole-genome transcriptomic profiles. However, data from public repositories are difficult to reprocess and reanalyze. Here, we present PulmonDB, a web-based database (http://pulmondb.liigh.unam.mx/) and R library that facilitates exploration of gene expression profiles for these diseases by integrating transcriptomic data and curated annotation from different sources. We demonstrated the value of this resource by presenting the expression of already well-known genes of COPD and IPF across multiple experiments and the results of two differential expression analyses in which we successfully identified differences and similarities. With this first version of PulmonDB, we create a new hypothesis and compare the two diseases from a transcriptomics perspective.

Novel computational analysis of large transcriptome datasets identifies sets of genes distinguishing chronic obstructive pulmonary disease from healthy lung samples

Article Open access 13 May 2021

Fibromine is a multi-omics database and mining tool for target discovery in pulmonary fibrosis

Article Open access 05 November 2021

Integrated transcriptomic correlation network analysis identifies COPD molecular determinants

Article Open access 25 February 2020

Introduction

A common way to study diseases is by using transcriptomic analysis, which can reveal components of the genome that are active and help us understand which biological processes are affected¹. Over the years, transcriptomic profiles have been compiled and published in public repositories such as Gene Expression Omnibus (GEO)^2,3 and ArrayExpress⁴. Having a way to compare transcriptomic data from Chronic Obstructive Pulmonary Disease (COPD) and Idiopathic Pulmonary Fibrosis (IPF) will help to identify common and distinct molecular mechanisms for these two diseases. However, an overwhelming task is to integrate high-throughput data from public repositories, because of platform differences (resulting in batch effects), heterogeneous experimental conditions, and the lack of uniformity on experimental annotations. Wang et al. reviewed different approaches in which they discussed tools such as GEO2R⁵, ScanGEO⁶, ImaGEO⁷, BioJupies⁸. These tools reuse public data, reanalyze it consistently, and integrate additional data. Even with these available tools, performing meta-analyses is still challenging⁹. In particular, for COPD and IPF, because the information from only a few experiments is available in these resources, such an analysis requires manual annotation by the user or inclusion of only curated GEO Datasets (also referred as GDS), and only none of them integrates microarray and RNA-Seq data, to our knowledge.

Therefore, we created a curated gene expression lung disease database, PulmonDB, to organize the currently large amount of expression data for both COPD and IPF. To accomplish this task, we used COMMAND > _, a web application previously used to create two successful transcriptomic compendia: one for bacterial genomes, COLOMBOS^10,11, and the second for grapevine VESPUCCI¹². While there are other chronic respiratory diseases, such as asthma, cystic fibrosis, and pulmonary hypertension association, among others, given the biological similarities between COPD and IPF, we decided to focus the first version of PulmonDB on these two diseases. We integrated transcriptomic experiments from different sources and their curated annotations, and built an online web resource to facilitate the exploration of gene expression profiles for COPD and IPF creating new hypotheses, and to allow for the identification of co-expression patterns in further analyses.

Results

PulmonDB is a relational database implemented in MySQL with lung disease transcriptome measurements, re-annotated platform probes, and manually curated data with a controlled vocabulary designed for lung diseases (Fig. 1). Tables were created to describe each feature and to connect the information across experiments, samples, measurements, platforms, genes, and annotated information. The full database scheme is provided in Supplementary Fig. 1.

PulmonDB a curated gene expression lung disease database

PulmonDB is a curated gene expression database of human lung diseases, with RNA-seq and microarray data from different platforms that have been uniformly preprocessed and manually curated to add sample and experiment information. In addition, we developed a website to access and visualize homogenized data (http://pulmondb.liigh.unam.mx/), and we also developed an R package (https://github.com/AnaBVA/pulmondb) to download curated annotation and preprocessed data that can be used for further analysis in the R environment.

Our database has a total of 76 GSEs, corresponding to 4481 unique preprocessed GSM contrasts that used 26 different platforms or GPLs (platform ID from GEO) (Fig. 2C). PulmonDB contains different sample types, we searched for human gene expression experiments related to COPD and IPF without any restriction. Lung biopsies account for 37.8% of samples, and 33.2% are blood samples. However, different cell types can be found in PulmonDB: some of them are primary cells (e.i. alveolar macrophages, fibroblasts, alveolar epithelial cells, etc.), and others are cell lines (e.i. A549) (Fig. 2A). Of the samples, 34.9% correspond to COPD, 40.5% to control samples (30.9% healthy plus 9.6% match tissue), 17.2% to IPF, and 1.5% to other diseases (Fig. 2B and Supplementary Table 2). We separated control tissues into two groups, “healthy” individuals, as far as the authors are aware and “match_tissue_controls” which refers to tissue samples from a phenotypically healthy region of a patient who had a tumor removed (e.i. non-tumor tissue from a cancer patient).

Although other resources reuse and reanalyze GEO data using web interfaces⁹, those tools are not specialized for lung diseases. Their limitations include the need for previous manual curation in each analysis, and they consider a small number of COPD and IPF experiments due to the fact that only curated GEO data are used. We designed a web interface that enables data exploration and visualization to facilitate lung disease analysis. This interface uses Clustergrammer¹³ to visualize gene expression values and the creation of interactive heatmaps that allow data exploration. A valuable feature of Clustergrammer is to be connected to EnrichR¹⁴, which provides pathway enrichment analysis. All these features together should help to generate new hypotheses about the pathologies of lung diseases to perform exploratory analyses, to visualize specific gene expression across public experiments for comparing results, and to generate new insights based on different data sets.

PulmonDB can recapitulate gene expression patterns expected in COPD and IPF

To show that PulmonDB can be used to recapitulate previously reported knowledge regarding COPD and IPF biology, we performed a literature search and manually selected relevant genes for each disease. We selected 19 genes related to IPF (not necessarily associated with gene expression in lung tissues) to visualize their gene expression: CCL18¹⁵, CXCL12¹⁶, CXCL13¹⁷, collagens (COL1A1, COL1A2, COL3A1, COL5A2, COL14A1)¹⁸, DSP¹⁹, FAS²⁰, IL-8²¹, MMP1²², MMP2²³, MMP7²², MUC5B¹⁹, SPP1²⁴, PTGS2²⁵, TGFB1²⁶ and THY1²⁷. Then, we selected eight IPF experiments performed with lung tissue biopsy samples (GSE32537, GSE21369, GSE24206, GSE94060, GSE72073, GSE35145, GSE31934), and using the PulmonDB website, we created a heatmap with the gene expression patterns and observed that the hierarchical clustering of these data separates IPF and control data sets (Fig. 3A, green and gray clusters at the bottom). For COPD, we curated 16 genes from the literature that were deemed relevant to this disease: HHIP^28,29, CFTR^30,31, PPARG³², SERPINA1^33,34, JUN³⁵, FAM13A³⁶, MYH10³⁵, CHRNA5³⁷, JUND³⁵, JUNB³⁵, TNF³⁴, MMP9³⁴, MMP12³⁴, CHRNA3³⁷, TGFBR3³², and GATA2³². We selected five experiments (GSE27597, GSE37768, GSE57148, GSE8581, GSE1122) performed on lung tissue biopsy samples from COPD patients and controls. Our hierarchical clustering analysis of the expression profiles using the PulmonDB interface allowed us to cluster patients and controls into two different groups (Fig. 3B), similar to the case of IPF. In conclusion, PulmonDB not only helps to recapitulate previously published work (Supplementary Fig. 3) but also helps to verify gene expression stability across experiments. This may help to analyze concordance in different experiments, contrast study results, show implications of using different control groups, etc. We believe this resource can be used to drive, make decisions, and support new hypotheses in experimental laboratories for studying molecular or cellular disease mechanisms.

Differences and similarities in COPD and IPF

PulmonDB can be used not only to replicate previous knowledge but also to provide a framework to test new hypotheses. In this context, we set out to investigate the differences and similarities between COPD and IPF in lung tissue when compared to samples from healthy individuals (Fig. 4A). Using PulmonDB in the R environment, we selected contrasts where the sample was annotated as lung biopsy and the reference status as HEALTHY/CONTROLs (GSE52463, GSE63073, GSE1122, GSE72073, GSE24206, GSE27597, GSE29133, GSE31934, GSE37768) (Fig. 4B), and then using limma³⁸ we assessed differential gene expression between COPD and IPF. We identified 1781 differentially expressed genes (Supplementary Fig. 4). To have a visual representation of the differences between COPD and IPF, we selected the top 20 differentially expressed genes and visualized their expression using the PulmonDB website tool (Fig. 4C). We observed that data sets tend to cluster by test status; Fig. 4C shows IPF contrasts on the left (turquoise), control contrasts in the middle (blue), and COPD contrasts on the right (red). Genes are clustered in two groups (left panel, y-axis); the first gene group (I) is overexpressed in IPF while it is barely expressed or underexpressed in COPD contrasts. By comparison, the second gene cluster (group II) is overexpressed in COPD contrasts and underexpressed in IPF. To correlate similarities among samples, the 20 top differentially expressed genes were used (Fig. 4C, right panel); samples from the same disease group showed higher correlations and tended to have a null or negative correlation with the HEALTHY/CONTROL and the opposite disease (Fig. 4C). For example, FOSB and CXCL2 have opposite behaviors, as both genes are overexpressed in COPD and underexpressed in IPF. FOSB is part of the family of Fos genes that can dimerize with JUN family proteins to form the transcription factor complex AP-1, which is related to COPD³⁹. CXCL2 is a chemokine secreted in inflammation that induces chemotaxis in neutrophils^40,41; these cells are predominant in COPD, and they are key mediators in tissue damage⁴². While neutrophils are also important in IPF, we observed their underexpression in this disease.

We also asked the opposite question, i.e., whether we could identify which genes that are shared between these two diseases. We assigned a weight to COPD and IPF expression to perform limma contrasts (Fig. 4D), which enabled us to identify when both diseases drove a differential expression profile. We selected the 20 top differentially expressed genes and visualized their expression patterns using PulmonDB website tool, and we could see that a set of genes was consistently overexpressed or underexpressed in both COPD and IPF. In particular, VCAM1 and FCN3 are differentially expressed in COPD and IPF, with a similar trend in both diseases when compared with HEALTHY/CONTROLs. VCAM1 is the vascular cell adhesion molecule 1, and it is important in the immune response for mediating cellular adhesion in leukocytes⁴³; it is overexpressed in these two diseases, suggesting infiltration of immune cells in both pathologies^44,45. In contrast, FCN3 (or ficolin 3) is underexpressed in both diseases: this gene is a collagen-like protein associated with the innate immune defense, as it activates the lectin complement pathway⁴⁶, which has been shown to be important in pulmonary pathologies^47,48.

As a result, PulmonDB assisted our analysis of COPD and IPF analogous and antagonist genes and can thus be used to dissect common molecular mechanisms, because both lung diseases are present under heterogeneous conditions with progressive and irreversible phenotypes mainly caused by smoking and by aging, plus both diseases entail cellular matrix remodeling. Furthermore, the differential gene signatures between COPD and IPF might explain the particularities of each disease.

Discussion

The present methodology had been previously applied for the study of bacterial and grapevine gene expression in different experiments and conditions, allowing for the integration of data from a diverse origin. Here we prove this methodology can also be applied to human data to exploit publicly available resources better, we hope these methods will be taken by other teams to create databases to help understand relevant diseases in other tissues.

PulmonDB can help the scientific community to study which genes have a distinct expression profile in COPD and IPF, explore experiments across technologies and platforms, identify interesting expression patterns across different diseases, generate new hypotheses, and find relationships among clinical or experimental variables. This database also enables comparisons of an updated collection of expression profiles already homogenized for their analyses of specific diseases. Additionally, having different lung diseases (COPD and IPF) in the same database creates the opportunity to observe their similarities and differences. In the future, we aim for PulmonDB to grow and include more diseases. To our knowledge, there is no other resource for transcriptomic analysis focused on the same lung diseases; for this reason, we believe researchers of different backgrounds can use and benefit from the information contained in PulmonDB, by using the web interface and the R package.

An integrated comparable collection of homogenized values with controlled vocabulary describing biological and technical characteristics will facilitate further comparative analyses, such as the study of profiles in COPD and IPF, exploration of experiments across technologies and platforms, identification of interesting coexpression patterns across different diseases, the generation of new hypotheses, and determination of relationships among clinical or experimental variables.

This project sets the foundation to integrate transcriptomics data of other respiratory diseases or related phenotypes and thus facilitates the identification of common and divergent pathways that lead to a pathological state. PulmonDB platform will be expanded in the future to include other lung diseases.

Methods

Platform and metadata

Most of the metadata was obtained from GEO. For specific cases, the platform information (.cdf file) was obtained from the Affymetrix website (http://www.affymetrix.com/site/mainPage.affx). Additional information (e.g., clinical data, source of the biological sample), was obtained either from metadata or manually curated from the original papers.

Inclusion criteria for transcriptome data

The experiments currently included in PulmonDB are listed in Supplementary Table 2.

We used two main resources to download raw data and preprocessed counts, GEO and Recount2.

Gene expression omnibus

Using GEO^2,3, we searched datasets related to COPD and IPF for gene expression data. The following queries were used to retrieve the experiments:

(“pulmonary disease, chronic obstructive”[MeSH Terms] OR COPD[All Fields]) AND “Homo sapiens”[porgn] AND (“gse”[Filter] AND (“Expression profiling by array”[Filter] OR “Expression profiling by high throughput sequencing”[Filter])).

(Idiopathic pulmonary fibrosis[All Fields] AND “Homo sapiens”[porgn] AND (“gse”[Filter] AND (“Expression profiling by array”[Filter] OR “Expression profiling by high throughput sequencing”[Filter])).

GEO experiments were manually curated, abstracts and related articles were revised, and only datasets confirmed as having COPD and/or IPF samples were considered. In order for an experiment to be included in PulmonDB we used the following criteria: The data set had to be original, samples had to be unique, raw data had to be public and available, platform information must had sequence probes, and custom platforms must have had information to link raw expression signal with the probe sequence. Otherwise, data sets were not taken into account for PulmonDB.

Recount2

Recount2 is an online resource with RNA-seq human experiments already preprocessed using Rail-RNA alignment and summarized by gene and exon counts⁴⁹. We used the keywords “IPF” and “COPD” separately in Recount2 to retrieve counts form RNA-seq.

Compendium creation

The compendium creation process was done as previously described in COLOMBOS and VESPUCCI^10,12. The platform was developed in bacteria and later employed in grapevine, but in this paper, we used COLOMBOS for the first time in human data. After we selected the datasets using the experiment ID from GEO (GSE), we worked on COMMAND>_⁵⁰.

COMMAND

COMMAND stands for COMpendia MANagement Desktop, it is a web application tool that provides a framework to facilitate and perform the following steps: (1) download data from selected experiments, (2) parse files and store data in database form, (3) probe-to-gene (re)mapping process, (4) sample curation and annotation with a controlled vocabulary, (5) selection of references and sample experiments to determine contrasts, (6) homogenization (and normalization) of data, and (7) perform data quality control (Fig. 1). This software can be used for any transcriptomic data⁵⁰.

In more detail, each experiment with a GSE ID, also referred to as a data set, was normalized independently without performing background correction, as explained in¹¹. We defined a contrast for each sample with a GSM ID (sample ID from GEO) by using a unique control reference sample per data set. The sample contrast per gene was defined as the log ratio between the expression value in the test condition (i.e., IPF, COPD) and the expression value in the reference condition (i.e., healthy, untreated, smokers without COPD) (Fig. 1, step 5). This gives every comparison an interpretable biological meaning when combined with extensive manual curated annotation. The condition properties describing the contrasts were then structured in a condition-controlled vocabulary tree. Finally, all contrasts were homogenized, resulting in direct comparable log ratios across all experiments; this information later became part of the final compendium of expression data (Supplementary Fig. 2).

PulmonDB uses a controlled vocabulary to describe sample metadata

A controlled vocabulary is required to create databases with homogeneous and standard information. For PulmonDB, we created a controlled vocabulary organized in a hierarchical structure that contains terms to annotate transcriptome experiments in lung diseases. We defined classes describing the main categories and terms that can be found in experiments, with some of them as mandatory features (e.i. sample type, sample status, and platform). Some non-IPF or non-COPD diseases were included in the controlled vocabulary because the original experiments used them.

Once the controlled vocabulary was established, each article related to the experiment was manually curated, and whenever it was necessary, new terms were added, making the vocabulary flexible and allowing for the inclusion of other diseases to our database in the future. Complete definitions of the terms are provided in Supplementary Table 1.

Experiment annotation

Each sample was manually annotated using the controlled vocabulary; when necessary, the vocabulary was updated to include new features. The information was curated by experts who reviewed the associated articles and protocols to retrieve data such as age, sex, ancestry, stage of disease or treatment, DLCO (the diffusing capacity of the lung for carbon monoxide, a common functional test), etc., from either GEO or the associated paper.

Homogenization and quality control

As described before, data homogenization was done with COMMAND>_^11,12. This step was performed on raw data without background correction, as it has been shown to retrieve more errors^51,52,53. A nonlinear model was applied to homogenize raw data. We used RMA Quantile for Affymetrix samples and loess fit for the other platforms. The next step was to summarize probes per transcript using RMA median polish summary from Affymetrix or with data averaged across replicates for the other platforms. After performing the homogenization step, low-quality microarrays were identified using MA plots and histograms of raw and homogenized data.

Website implementation

PulmonDB has a web interface that uses Clustergrammer (https://clustergrammer.readthedocs.io/index.html)¹³ to visualize gene expression contrasts. Clustergrammer has a frontend in javascript and a backend in python, supporting an interactive web application for gene expression exploration. The PulmonDB web interface requires one or several GSE identifiers and more than two gene names to generate interactive heatmaps.

In addition, Clustergrammer is connected with EnrichR (http://amp.pharm.mssm.edu/Enrichr/)¹⁴, an integrative web application tool for enrichment analysis that helps the user explore not only potentially differentiated genes but also enriched pathways, facilitating the discovery of transcriptomic signature patterns in lung diseases or related phenotypes.

COPD and IPF comparative analysis

We used limma 3.40.0 in a Rstudio environment 3.6.0 for our comparative analyses, and the GSE ID was included in the linear model. Then, two contrasts were created: (1) “COPD – IPF”, for obtaining differentially expressed genes between COPD and IPF, and (2) “(COPD + IPF)/2 – CONTROL”, for genes similarly expressed between COPD/IPF and CONTROL. Differential gene expression analyses were adjusted for multiple testing using the false discovery rate (FDR) method, also referred to as Benjamini & Hochberg adjustment. We applied a cutoff of the adjusted p-value < 0.05, and after sorting based on the log fold change, the top 20 genes were obtained.

Data availability

PulmonDB is accessible (http://pulmondb.liigh.unam.mx/) and through an R package (https://github.com/AnaBVA/pulmondb).

References

Qian, X., Ba, Y., Zhuang, Q. & Zhong, G. RNA-Seq technology and its application in fish transcriptomics. OMICS 18, 98–110 (2014).
Article CAS PubMed PubMed Central Google Scholar
geo. Home - GEO - NCBI. Available at: https://www.ncbi.nlm.nih.gov/geo/. (Accessed: 21st July 2019)
Clough, E. & Barrett, T. The Gene Expression Omnibus Database. Methods Mol. Biol. 1418, 93–110 (2016).
Article PubMed PubMed Central Google Scholar
EMBL-EBI. ArrayExpress <EMBL-EBI. Available at: https://www.ebi.ac.uk/arrayexpress/. (Accessed: 21st July 2019)
Barrett, T. et al. NCBI GEO: archive for functional genomics data sets–update. Nucleic Acids Res. 41, D991–5 (2013).
Article CAS PubMed Google Scholar
Koeppen, K., Stanton, B. A. & Hampton, T. H. ScanGEO: parallel mining of high-throughput gene expression data. Bioinformatics 33, 3500–3501 (2017).
Article CAS PubMed PubMed Central Google Scholar
Toro-Domínguez, D. et al. ImaGEO: integrative gene expression meta-analysis from GEO database. Bioinformatics 35, 880–882 (2019).
Article PubMed CAS Google Scholar
Torre, D., Lachmann, A. & Ma’ayan, A. BioJupies: Automated Generation of Interactive Notebooks for RNA-Seq Data Analysis in the Cloud. Cell Syst 7, 556–561.e3 (2018).
Article CAS PubMed PubMed Central Google Scholar
Wang, Z., Lachmann, A. & Ma’ayan, A. Mining data and metadata from the gene expression omnibus. Biophys. Rev. 11, 103–110 (2019).
Article CAS PubMed Google Scholar
Moretto, M. et al. COLOMBOS v3.0: leveraging gene expression compendia for cross-species analyses. Nucleic Acids Res. 44, D620–3 (2016).
Article CAS PubMed Google Scholar
Engelen, K. et al. COLOMBOS: access port for cross-platform bacterial expression compendia. PLoS One 6, e20938 (2011).
Article ADS CAS PubMed PubMed Central Google Scholar
Moretto, M. et al. VESPUCCI: Exploring Patterns of Gene Expression in Grapevine. Front. Plant Sci. 7, 633 (2016).
Article PubMed PubMed Central Google Scholar
Fernandez, N. F. et al. Clustergrammer, a web-based heatmap visualization and analysis tool for high-dimensional biological data. Sci Data 4, 170151 (2017).
Article PubMed PubMed Central Google Scholar
Chen, E. Y. et al. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinformatics 14, 128 (2013).
Article PubMed PubMed Central Google Scholar
Cai, M. et al. CCL18 in serum, BAL fluid and alveolar macrophage culture supernatant in interstitial lung diseases. Respir. Med. 107, 1444–1452 (2013).
Article PubMed Google Scholar
Antoniou, K. M. et al. Expression analysis of angiogenic growth factors and biological axis CXCL12/CXCR4 axis in idiopathic pulmonary fibrosis. Connect. Tissue Res. 51, 71–80 (2010).
Article CAS PubMed Google Scholar
Vuga, L. J. et al. C-X-C motif chemokine 13 (CXCL13) is a prognostic biomarker of idiopathic pulmonary fibrosis. Am. J. Respir. Crit. Care Med. 189, 966–974 (2014).
Article CAS PubMed PubMed Central Google Scholar
Jenkins, R. G. et al. Longitudinal change in collagen degradation biomarkers in idiopathic pulmonary fibrosis: an analysis from the prospective, multicentre PROFILE study. Lancet Respir Med 3, 462–472 (2015).
Article CAS PubMed Google Scholar
Allen, R. J. et al. Genetic variants associated with susceptibility to idiopathic pulmonary fibrosis in people of European ancestry: a genome-wide association study. Lancet Respir Med 5, 869–880 (2017).
Article CAS PubMed PubMed Central Google Scholar
Huang, S. K. et al. Histone modifications are responsible for decreased Fas expression and apoptosis resistance in fibrotic lung fibroblasts. Cell Death Dis. 4, e621 (2013).
Article CAS PubMed PubMed Central Google Scholar
Yang, L. et al. IL-8 mediates idiopathic pulmonary fibrosis mesenchymal progenitor cell fibrogenicity. Am. J. Physiol. Lung Cell. Mol. Physiol. 314, L127–L136 (2018).
Article PubMed CAS Google Scholar
Rosas, I. O. et al. MMP1 and MMP7 as potential peripheral blood biomarkers in idiopathic pulmonary fibrosis. PLoS Med. 5, e93 (2008).
Article PubMed PubMed Central CAS Google Scholar
García-Alvarez, J. et al. Membrane type-matrix metalloproteinases in idiopathic pulmonary fibrosis. Sarcoidosis Vasc. Diffuse Lung Dis. 23, 13–21 (2006).
PubMed Google Scholar
Pardo, A. et al. Up-regulation and profibrotic role of osteopontin in human idiopathic pulmonary fibrosis. PLoS Med. 2, e251 (2005).
Article PubMed PubMed Central CAS Google Scholar
Parra, E. R., Lin, F., Martins, V., Rangel, M. P. & Capelozzi, V. L. Immunohistochemical and morphometric evaluation of COX 1 and COX-2 in the remodeled lung in idiopathic pulmonary fibrosis and systemic sclerosis. J. Bras. Pneumol. 39, 692–700 (2013).
Article PubMed PubMed Central Google Scholar
Martinez, F. J. et al. Idiopathic pulmonary fibrosis. Nat Rev Dis Primers 3, 17074 (2017).
Article PubMed Google Scholar
Sanders, Y. Y. et al. Thy-1 promoter hypermethylation: a novel epigenetic pathogenic mechanism in pulmonary fibrosis. Am. J. Respir. Cell Mol. Biol. 39, 610–618 (2008).
Article CAS PubMed PubMed Central Google Scholar
Zhou, X. et al. Identification of a chronic obstructive pulmonary disease genetic determinant that regulates HHIP. Hum. Mol. Genet. 21, 1325–1335 (2012).
Article CAS PubMed Google Scholar
Chang, W.-A., Tsai, M.-J., Jian, S.-F., Sheu, C.-C. & Kuo, P.-L. Systematic analysis of transcriptomic profiles of COPD airway epithelium using next-generation sequencing and bioinformatics. Int. J. Chron. Obstruct. Pulmon. Dis. 13, 2387–2398 (2018).
Article CAS PubMed PubMed Central Google Scholar
Rab, A. et al. Cigarette smoke and CFTR: implications in the pathogenesis of COPD. Am. J. Physiol. Lung Cell. Mol. Physiol. 305, L530–41 (2013).
Article CAS PubMed PubMed Central Google Scholar
Campbell, J. D. et al. A gene expression signature of emphysema-related lung destruction and its reversal by the tripeptide GHK. Genome Med. 4, 67 (2012).
CAS PubMed PubMed Central Google Scholar
Hedström, U. et al. Bronchial extracellular matrix from COPD patients induces altered gene expression in repopulated primary human bronchial epithelial cells. Sci. Rep. 8, 3502 (2018).
Article ADS PubMed PubMed Central CAS Google Scholar
Lackey, L., McArthur, E. & Laederach, A. Increased Transcript Complexity in Genes Associated with Chronic Obstructive Pulmonary Disease. PLoS One 10, e0140885 (2015).
Article PubMed PubMed Central CAS Google Scholar
Kotnala, S., Tyagi, A. & Muyal, J. P. rHuKGF ameliorates protease/anti-protease imbalance in emphysematous mice. Pulm. Pharmacol. Ther. 45, 124–135 (2017).
Article CAS PubMed Google Scholar
Kim, W. J. et al. Comprehensive Analysis of Transcriptome Sequencing Data in the Lung Tissues of COPD Subjects. Int. J. Genomics Proteomics 2015, 206937 (2015).
Google Scholar
Yun, J. H. et al. Transcriptomic Analysis of Lung Tissue from Cigarette Smoke-Induced Emphysema Murine Models and Human Chronic Obstructive Pulmonary Disease Show Shared and Distinct Pathways. Am. J. Respir. Cell Mol. Biol. 57, 47–58 (2017).
Article CAS PubMed PubMed Central Google Scholar
Matsson, H. et al. Targeted high-throughput sequencing of candidate genes for chronic obstructive pulmonary disease. BMC Pulm. Med. 16, 146 (2016).
Article PubMed PubMed Central CAS Google Scholar
Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).
PubMed PubMed Central Google Scholar
Mroz, R. M., Holownia, A., Chyczewska, E. & Braszko, J. J. Chronic obstructive pulmonary disease: an update on nuclear signaling related to inflammation and anti-inflammatory treatment. J. Physiol. Pharmacol. 59(Suppl 6), 35–42 (2008).
PubMed Google Scholar
Kim, D. & Haynes, C. L. Neutrophil chemotaxis within a competing gradient of chemoattractants. Anal. Chem. 84, 6070–6078 (2012).
Article CAS PubMed PubMed Central Google Scholar
Larsson, K. Aspects on pathophysiological mechanisms in COPD. J. Intern. Med. 262, 311–340 (2007).
Article CAS PubMed Google Scholar
Hoenderdos, K. & Condliffe, A. The neutrophil in chronic obstructive pulmonary disease. Am. J. Respir. Cell Mol. Biol. 48, 531–539 (2013).
Article CAS PubMed Google Scholar
Ley, K. & Huo, Y. VCAM-1 is critical in atherosclerosis. The Journal of clinical investigation 107, 1209–1210 (2001).
Article CAS PubMed PubMed Central Google Scholar
Nakao, A., Hasegawa, Y., Tsuchiya, Y. & Shimokata, K. Expression of cell adhesion molecules in the lungs of patients with idiopathic pulmonary fibrosis. Chest 108, 233–239 (1995).
Article CAS PubMed Google Scholar
Davis, B. B. et al. Leukocytes are recruited through the bronchial circulation to the lung in a spontaneously hypertensive rat model of COPD. PLoS One 7, e33304 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Garred, P., Honoré, C., Ma, Y. J., Munthe-Fog, L. & Hummelshøj, T. MBL2, FCN1, FCN2 and FCN3-The genes behind the initiation of the lectin pathway of complement. Mol. Immunol. 46, 2737–2744 (2009).
Article CAS PubMed Google Scholar
Pandya, P. H. & Wilkes, D. S. Complement system in lung disease. Am. J. Respir. Cell Mol. Biol. 51, 467–473 (2014).
Article PubMed PubMed Central CAS Google Scholar
Eisen, D. P. Mannose-binding lectin deficiency and respiratory tract infection. J. Innate Immun. 2, 114–122 (2010).
Article CAS PubMed Google Scholar
Collado-Torres, L. et al. Reproducible RNA-seq analysis using recount2. Nat. Biotechnol. 35, 319–321 (2017).
Article CAS PubMed PubMed Central Google Scholar
Moretto, M., Sonego, P., Villaseñor-Altamirano, A. B. & Engelen, K. First step toward gene expression data integration: transcriptomic data acquisition with COMMAND>_. BMC Bioinformatics 20, 54 (2019).
Article PubMed PubMed Central Google Scholar
Irizarry, R. A. et al. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4, 249–264 (2003).
Article PubMed MATH Google Scholar
Fujino, N. et al. Gene expression profiles of alveolar type II cells of chronic obstructive pulmonary disease: a case-control study. BMJ Open, 2, (2012).
Article PubMed PubMed Central Google Scholar
Golpon, H. A. et al. Emphysema lung tissue gene expression profiling. Am. J. Respir. Cell Mol. Biol. 31, 595–600 (2004).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We are thankful to the colleagues who help us installing and maintaining the PulmonDB server, particularly Luis Alberto Aguilar -Bautista and members of Laboratorio Nacional de Visualización Científica Avanzada, México. We are grateful to Miguel Negreros for discussing the concepts for curation and Orlando Santillán for his insights for parsing GEO data. We thank Alejandra Castillo and Carina Uribe for technical assistance. We thank Mauricio Guzmán and Centro Cultural Cine y Arte, particularly Renata Campuzano and Diego Morales for graphical and design assistance. Y.I.B.-M. acknowledges the Cátedras CONACyT program. We also acknowledge Catalina Frank, José Antonio Alonso and the Manuscript Writing Training Team (CEMAI for its Spanish acronym) of CONACyT for their help with the structure, reviews and constructive criticism of this research paper. A.M.-R.’s laboratory is supported by a CONACYT grant [269449], Programa de Apoyo a Proyectos de Investigación e Innovación Tecnológica – Universidad Nacional Autónoma de México (PAPIIT-UNAM) grant [IA206517-IA201119] and Estímulos a Investigaciones Médicas “Miguel Alemán Valdés”; J.C.-V., A.M.-R., Y.I.B.-M., and M.S., further acknowledge CONACYT “Fronteras de la Ciencia” support [Project 15]. A.B.V.-A. is a doctoral student from the Programa de Doctorado en Ciencias Biomédicas, Universidad Nacional Autónoma de México (UNAM) and has received CONACyT fellowship CVU 557690.

Author information

Authors and Affiliations

Laboratorio Internacional de Investigación sobre el Genoma Humano, UNAM, Juriquilla, Mexico
Ana B. Villaseñor-Altamirano, Jair. S. García-Sotelo, Oscar Aldana-Assad & Alejandra Medina-Rivera
Unit of Computational Biology, Research and Innovation Centre, Fondazione Edmund Mach, 38010, San Michele all’Adige, Italy
Marco Moretto & Kristof Engelen
Instituto Nacional de Enfermedades Respiratorias Ismael Cosío Villegas, Mexico City, Mexico
Mariel Maldonado, Adrián Munguía-Reyes, Moisés Selman & Yalbi I. Balderas-Martínez
Center for Genomic Sciences, UNAM, Cuernavaca, Mexico
Alejandra Zayas-Del Moral & Julio Collado-Vides
Facultad de Ciencias, UNAM, Mexico City, Mexico
Yair Romero
Laboratorio Nacional de Visualización Científica Avanzada, LAVIS, UNAM, Juriquilla, Mexico
Luis A. Aguilar
Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA
Julio Collado-Vides
CONACYT-Instituto Nacional de Enfermedades Respiratorias Ismael Cosío Villegas, Mexico City, Mexico
Yalbi I. Balderas-Martínez

Authors

Ana B. Villaseñor-Altamirano
View author publications
You can also search for this author in PubMed Google Scholar
Marco Moretto
View author publications
You can also search for this author in PubMed Google Scholar
Mariel Maldonado
View author publications
You can also search for this author in PubMed Google Scholar
Alejandra Zayas-Del Moral
View author publications
You can also search for this author in PubMed Google Scholar
Adrián Munguía-Reyes
View author publications
You can also search for this author in PubMed Google Scholar
Yair Romero
View author publications
You can also search for this author in PubMed Google Scholar
Jair. S. García-Sotelo
View author publications
You can also search for this author in PubMed Google Scholar
Luis A. Aguilar
View author publications
You can also search for this author in PubMed Google Scholar
Oscar Aldana-Assad
View author publications
You can also search for this author in PubMed Google Scholar
Kristof Engelen
View author publications
You can also search for this author in PubMed Google Scholar
Moisés Selman
View author publications
You can also search for this author in PubMed Google Scholar
Julio Collado-Vides
View author publications
You can also search for this author in PubMed Google Scholar
Yalbi I. Balderas-Martínez
View author publications
You can also search for this author in PubMed Google Scholar
Alejandra Medina-Rivera
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.B.V.-A. prepared and created all figures. A.B.V.-A., Y.I.B.-M. and A.M.-R. wrote the main manuscript text. Mariel M., A.M.-R., Y.R. and Y.I.B.-M. manually curate the data. A.B.V.-A., Marco M., E.K. and A.Z.-D.M. downloaded, processed and analysed data, and created the database. A.B.V.-A. and O.A.-A. created the R package. A.B.V.-A., J.S.G.-S., O.A.-A. and L.A.A. built, modified and created the web interface. M.S., J.C.-V., Y.I.B.-M. and A.M.-R. jointly supervised this work. All authors reviewed the manuscript.

Corresponding authors

Correspondence to Julio Collado-Vides, Yalbi I. Balderas-Martínez or Alejandra Medina-Rivera.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary material

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Villaseñor-Altamirano, A.B., Moretto, M., Maldonado, M. et al. PulmonDB: a curated lung disease gene expression database. Sci Rep 10, 514 (2020). https://doi.org/10.1038/s41598-019-56339-5

Download citation

Received: 12 September 2019
Accepted: 05 December 2019
Published: 16 January 2020
DOI: https://doi.org/10.1038/s41598-019-56339-5

This article is cited by

Fibromine is a multi-omics database and mining tool for target discovery in pulmonary fibrosis
- Dionysios Fanidis
- Panagiotis Moulos
- Vassilis Aidinis
Scientific Reports (2021)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Novel computational analysis of large transcriptome datasets identifies sets of genes distinguishing chronic obstructive pulmonary disease from healthy lung samples

Fibromine is a multi-omics database and mining tool for target discovery in pulmonary fibrosis

Integrated transcriptomic correlation network analysis identifies COPD molecular determinants

Introduction

Results

PulmonDB a curated gene expression lung disease database

PulmonDB can recapitulate gene expression patterns expected in COPD and IPF

Differences and similarities in COPD and IPF

Discussion

Methods

Platform and metadata

Inclusion criteria for transcriptome data

Gene expression omnibus

Recount2

Compendium creation

COMMAND

PulmonDB uses a controlled vocabulary to describe sample metadata

Experiment annotation

Homogenization and quality control

Website implementation

COPD and IPF comparative analysis

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary material

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Fibromine is a multi-omics database and mining tool for target discovery in pulmonary fibrosis

Comments

Search

Quick links