ComPath: an ecosystem for exploring, analyzing, and curating mappings across pathway databases

Domingo-Fernández, Daniel; Hoyt, Charles Tapley; Bobis-Álvarez, Carlos; Marín-Llaó, Josep; Hofmann-Apitius, Martin

doi:10.1038/s41540-018-0078-8

Download PDF

Technology Feature
Open access
Published: 13 December 2018

ComPath: an ecosystem for exploring, analyzing, and curating mappings across pathway databases

npj Systems Biology and Applications volume 4, Article number: 43 (2018) Cite this article

5817 Accesses
16 Citations
15 Altmetric
Metrics details

Subjects

A Publisher Correction to this article was published on 07 March 2019

This article has been updated

Abstract

Although pathways are widely used for the analysis and representation of biological systems, their lack of clear boundaries, their dispersion across numerous databases, and the lack of interoperability impedes the evaluation of the coverage, agreements, and discrepancies between them. Here, we present ComPath, an ecosystem that supports curation of pathway mappings between databases and fosters the exploration of pathway knowledge through several novel visualizations. We have curated mappings between three of the major pathway databases and present a case study focusing on Parkinson’s disease that illustrates how ComPath can generate new biological insights by identifying pathway modules, clusters, and cross-talks with these mappings. The ComPath source code and resources are available at https://github.com/ComPath and the web application can be accessed at https://compath.scai.fraunhofer.de/.

Introduction

The notion of pathways enables the representation, formalization, and interpretation of biological events or series of interactions. Cataloging biological knowledge into pathways reduces complexity from all possible interacting molecular entities to a set of well-studied and validated functional relationships between molecular entities culminating in biological processes. Several efforts have generated databases of pathways with varying specificity and granularity that comprise signaling cascades, metabolic routes, and regulatory networks from precise signatures with no more than a couple of acting players to general pathways involving thousands of molecular players.^1,2,3,4

Simplifying biology into pathways and representation as network models or mathematical models inevitably results in a loss of information such as spatiotemporal information or even entire biological entity types. The network abstraction facilitates pathway visualization and interpretation thanks to the harmony between biological networks and systems: nodes correspond to molecular entities and edges to types of interactions occurring between them (e.g., inhibition, phosphorylation, etc.). Although networks can comprise a broad range of molecular types (e.g., proteins, chemicals, small molecules, etc.), they are generally reduced to the most direct outcome of our genetic makeup - the genetic and protein levels - so that we can mechanistically understand their functionality. Thus, they are frequently viewed and simplified to “gene sets”, the collection of all genes/proteins that constitute the pathway, due to the major challenges of incorporating network topology and translating the variety of relationships into pathway analysis methods.

While dedicated research groups and commercial entities with experienced curators have lead a majority of the efforts to compile, delineate, and store biological knowledge into pathway databases,^2,5 community and crowdsourced efforts have recently gained traction.^3,6 Further, the variability in curation team composition, database scope (e.g., signaling pathways, gene regulatory networks, and metabolic processes), and curation guidelines led to the adoption of different (and in many ways incompatible) schemata and formalisms such as Biological Pathway Exchange (BioPAX;⁷) and Systems Biology Markup Language (SBML;⁸). These incompatibilities motivated the integration and harmonization of resources into pathway meta-databases such as Pathway Commons⁹ and PathCards,¹⁰ which focus on integrating databases; iPath,¹¹ which focuses on pathway visualization; and SIGNOR, which focuses on signaling pathways.¹²

Even after integrating multiple pathway databases into a pathway meta-database, it is difficult to assess the agreements, discrepancies, redundancy, and the complementarity of their contents because of the lack of availability of pathway mappings (e.g., pathway A from resource X is equivalent to pathway B from resource Y) in the original databases. These mappings are difficult to establish because of the arbitrary and overlapping nature of pathway boundaries as well as the absence of a common pathway nomenclature. Several controlled vocabularies have been generated as initial attempts to standardize pathway nomenclature,^13,14 but most pathway databases had already been established by the time these ontologies were published. Therefore, consolidating pathway knowledge is a persisting issue and it is still required to map pathways from different resources together to improve database interoperability.

Hierarchical clustering approaches have been presented as a way of grouping similar pathways based on their corresponding gene sets in order to propose pathway mappings.^10,15 Though these approaches can systematically cluster pathways from multiple resources, there are some limitations to consider: first, the usual tradeoff between over/under-clustering,¹⁶ and second, pathway nomenclature and biological context are not considered by the clustering algorithm; it often leaves out equivalent pathways with low similarity and ignores the context of the pathway (e.g., cell/disease specificity). Nevertheless, these limitations can be overcome by following clustering and prioritization methods with the manual curation required to interpret the abstract concepts that inherent to pathway definitions (e.g., biological process, cellular location, condition, etc.).

Though numerous algorithms¹⁷ and tools^4,18 have been successfully applied to interpret experimental data through the context of pathway databases,^19,20 there has not yet been a systematic comparison between the contents of various pathway databases, an assessment of their overlaps and gaps, or an establishment of mappings. Previous studies have only focused on comparing a single or small set of well-established pathways across multiple resources.^21,22 For example, a comparison focused on metabolic pathways revealed how a set of five databases only agreed in a minimum core of the biochemistry knowledge.²³

These studies demonstrate the need to connect insights provided by each pathway database to foster a greater understanding of the underlying biology. Here, we present ComPath, a web application that integrates content from publicly accessible pathway databases, generates comparisons, enables exploration, and facilitates curation of inter-database mappings.

Results

We developed an interactive web application that enables users to explore, analyze, and curate pathway knowledge. Below, we present three case studies illustrating how it can be used for each of these purposes. The figures for each were generated by interactive, dynamic views in the ComPath web application based on three major public pathway databases: KEGG, Reactome, and WikiPathways (Fig. 1).

Case study I: comparison of pathway databases

Assessment of gene coverage

Analysis of the overlaps between Kyoto Encyclopedia of Genes and Genomes (KEGG), Reactome, and WikiPathways revealed that there are ~3800 common human genes shared between the three databases (Fig. 2a). While at least one common human gene was present in almost every pathway across each database, the number of pathways with more common human genes diminishes much more quickly in WikiPathways and Reactome (Supplementary Figure S1). This may be due to database properties such as pathway size (e.g., on average, pathways contain 90 genes in KEGG, 50 in Reactome, and 42 in WikiPathways) or gene promiscuity (i.e. genes functionally linked to many pathways) that might influence the results of analyses using pathway resources (Supplementary Table 2). For further investigation, the ComPath web application generates summary tables and creates several visualizations to enable exploration of the distributions of pathway size and gene memberships for each database, visualizations that present an overview of the database properties to help identify effects such as gene promiscuity or differences the distribution of gene set sizes (Fig. 2b).

Exploration of pathways

While the previous views produced gene-centric summaries of the contents of pathway databases, ComPath also enables the exploration of pathway similarity landscape using Clustergrammer.js.²⁴ Figure 2C illustrates how this view can identify clusters of pathways based on their similarity and then elucidate the hierarchical relationships between the Metabolic pathway, the largest KEGG pathway, and other more high-granular KEGG metabolic pathways (e.g., alpha-Linolenic acid metabolism, Lipoic acid metabolism, and ether lipid metabolism).

Case study II: identification of pathway modules, overlaps, and interplays using pathway enrichment

ComPath couples classic pathway enrichment analysis^18,25,26,27 with pathway-centric visualizations to identify modules, investigate overlaps, and cluster pathways. This case study demonstrates their use to investigate the roles of the pathways related to established genetic associations in the context of Parkinson's disease (PD).

Pathway enrichment with Fisher's exact test using a gene panel associated with PD reviewed by Brás et al.²⁸ (the gene set will be referenced as PDgset) yielded over 300 pathways containing at least one of the panel's genes (Fig. 3a). We discarded pathways with fewer than two genes from PDgset, that were larger than 300 genes, or that were not found to be statistically significant (false discovery rate >5%) after applying multiple hypothesis testing correction with the Benjamini–Yekutieli method under dependency.²⁹

Three views were used to assist in the interpretation of the remaining 29 enriched pathways: a pathway network view was used to identify pathway modules, a pathway overlap view was used to explore the intersections and cross-talks between pathways, and a pathway dendrogram view was used for clustering.

The pathway network view renders a pathway-to-pathway network in which nodes represent pathways and weighted edges represent their corresponding gene set similarities in a similar fashion to PathwayConnector.³⁰ For the PDgset, this visualization helped us to define six different modules (i.e., groups of pathways) by removing edges with a weight lower than 0.2 (Fig. 3b). The largest module (labeled as M₁) contained pathways related to the processes of endocytosis and vesicle transport, both of which are putatively disrupted in PD.³¹ M₂ comprised pathways related to PTK6 signaling such as the Reactome pathway, PTK6 promotes HIF1A stabilization, whose high pathway enrichment significance (q-value = 0.0005), as well as its role in regulating another PDgset gene, ATP13A2,³² suggests that it may be linked to PD. ATP13A2 is directly responsible for Kufor-Rakeb syndrome,³³ a rare juvenile form of PD, and participates in two other PD mechanisms: lysosomal iron storage and mitochondrial stress. Because pathways related to these two mechanisms (i.e., Lysosome pathway from KEGG, Pink/Parkin mediated mitophagy from Reactome, and Mitophagy pathway from both KEGG and Reactome; M₄) were also enriched by pathway enrichment analysis, we investigated the role of ATP13A2 in PD further.

ATP13A2 is activated by phosphatidylinositol(3,5)bisphosphate, a particular phosphatidylinositol involved in M₃ pathways (phosphatidylinositol metabolism and signaling pathways). Because this activation leads to a reduction in mitochondrial stress and α-synuclein toxicity, two hallmarks of PD, ATP13A2 has been proposed as a therapeutic target.³⁴ Ultimately, the exploration of the similarities and cross-talks between these three modules suggests further investigation of the candidate PD gene ATP13A2. Ultimately, this view complements pathway enrichment in the identification of pathway modules, exploration pathway cross-talks, and prioritization of genes for further study.

While the pathway network viewer provides an overview of the different modules and their cross-talks, it does not reveal information about their contained pathways' boundaries and intersections. Therefore, we implemented the pathway overlap view; an interactive Euler diagram that allows exploration of pathway demarcations (Fig. 3c). We employed this view to identify the set of genes common to all pathways in M₅, a module comprising the two Alzheimer's disease (AD) and two PD pathways from KEGG and WikiPathways. Subsequently, we used the ComPath pathway enrichment wizard to investigate in which pathways the common five genes identified (APAF1, CASP3, CASP9, CYCS, and SNCA) participate. The analysis revealed that they are predominantly involved in apoptosis, an important process in both AD and PD pathophysiology.^35,36

The third visualization renders the results of the hierarchical clustering approach described in Chen et al. in the form of a dendrogram, enabling deterministic pathway grouping based on gene set similarity. We used this view in the PDgset example to assign the pathways without module membership to the closest module (Supplementary Figure S2). The dendrogram proposed merging three previously unassigned pathways into M₂ (i.e., Allograft Rejection, MAPK Signaling pathway, and Rasp1 signaling pathway). Additionally, the resulting dendrogram from clustering revealed hierarchical relationships between pathways (e.g., Pink/Parkin Mediated Mitophagy is a subset of the Reactome Mitophagy pathway), information that can be used to establish pathway mappings, as we show in the following case study.

Case study III: establishing mappings between pathway databases

ComPath, as well as other tools, have demonstrated the benefits of integrating pathway knowledge from diverse resources to improve biological functional analysis.^9,10,18 However, even after overcoming the technical hurdle of harmonizing different formats used by different databases, these integrative approaches must be complemented by mappings at a pathway level in order to have cross references between databases; thus, improving their interoperability. Such information could then be used to first link related pathways and then investigate their interplays, explore the consistency of their boundaries, calculate their discrepancies and agreements, or simply contextualize the knowledge around a certain biological process.

In order to address this, ComPath introduces a curation environment in which users from the scientific community can propose and maintain a collection of established mappings between pathways from various databases. This laborious task is facilitated by the interactive visualizations (i.e., a dendrogram view and a similarity landscape heatmap) presented in the previous case studies as well as dedicated pathway pages where the content, descriptions, references, and the established mappings can be examined (Fig. 4a). Furthermore, ComPath suggests the most similar pathways based on this information so users can propose new mappings. This new mappings are included into the mapping catalog that serves as a search interface as well as a distribution platform for mappings (Fig. 4b). In addition, the mapping catalog promotes community engaging incorporating a voting system where authenticated users can agree or disagree on mappings; this way, proposed mappings with a net sum of votes >3 are automatically registered as accepted.

After an exhaustive investigation of all possible mappings between pathways in KEGG, Reactome, and WikiPathways (see Methods), we identified 58 equivalencies between KEGG and Reactome, 64 between Reactome and WikiPathways, and 55 between KEGG and WikiPathways. Of these equivalent pathways, 21 are shared between the three resources (Fig. 5 and Supplementary Table 4). We also identified 247 hierarchical relationships between KEGG and Reactome, 597 between KEGG and WikiPathways, and 564 between Reactome and WikiPathways. After considering these, approximately 26% of KEGG, 70% of Reactome, and 35% of WikiPathways did not share any mappings with any other database (Supplementary Figure S4). The high uniqueness observed in Reactome could be attributed to several factors: its small pathway sizes, its high granularity, and its high coverage of HGNC (Fig. 2a).

The results of this curation effort are distributed at https://github.com/ComPath/resources and https://compath.scai.fraunhofer.de/ so they can be revised, updated, and exploited by the research community hoping that this work serves as a first endeavor towards unifying pathway knowledge.

Discussion

The lack of a lingua franca in systems biology hampers the harmonization that would enable the exploration of the coverage, agreements, or discrepancies in the pathway knowledge. Harmonizing this information is an important step to better comprehend and model biology as well as improve the bioinformatics pipelines that utilize this knowledge to elucidate biological insights. As a first step towards closing this gap, we have implemented an environment capable of accommodating the pathway knowledge from multiple databases in order to facilitate its exploration and analysis through a web application. The flexibility of ComPath enables the incorporation of additional databases as well as dynamic update of its resources; the latter of which is often neglected, but can have a significant effect on derived analyses.³⁷ Additionally, an embedded curation interface allow users to curate and establish mappings between pathways. Accordingly, we used ComPath to conduct extensive curation work to link the pathways from three major pathway databases in order to evaluate their similarities and differences. This mapping catalog serves as a first effort towards unifying and linking pathway information across databases that can later be adopted by the original databases or to create ontologies that store these mappings. Because databases regularly add new pathways and update gene identifiers, we plan to update ComPath biannually as well as curate mappings for these newly added pathways – current mappings do not have to be updated since the focus of the pathway does not change.

The common genes between KEGG, Reactome, and WikiPathways covered the majority of pathways, indicating that their pathway knowledge is partially biased towards this shared gene set, even while there are still thousands of genes that have not yet been functionally annotated to pathways. Furthermore, our curation effort revealed that a surprisingly low number of pathways (21) were equivalent between KEGG, Reactome, and WikiPathways. On the other hand, the number of mapped pathways increased significantly when the hierarchical mappings were considered, revealing the inconsistent granularity employed to delineate pathway boundaries.

Although the absence of topological pathway information in ComPath is an irrefutable limitation in this study, gene-centric approaches enable a reduction of complexity in pathway comparison as well as integration of resources which do not provide topology information.¹⁰ Furthermore, recent studies revealed significant differences across a large sample of topology-based pathway analysis methods,³⁸ and highlighted that gene sets alone might be sufficient to detect an enriched pathway under realistic circumstances.³⁹ Hence, even if the abstraction of pathways as gene sets might not exploit all the existing pathway information, it is sufficient to drive an investigation of the pathway knowledge.

The established inter-database mappings allowed to link pathways from three major databases, opening the door towards a better integration of the pathway knowledge. In the future, these links can be used to complement and fill pathway knowledge as well as to conduct a precise evaluation of equivalent or related pathways by exploiting the available format converters such as the converter from Reactome to WikiPathways.⁴⁰ Furthermore, ComPath have been designed to accommodate multiple types of molecular entities participating in pathways (i.e. Reactome chemical information); thus, enabling to replicate the analyses presented with lipid or metabolite databases such as LIPEA⁴¹ or HMDB.⁴²

In summary, we demonstrated that ComPath serves as an exploratory, analytic, and curation framework for pathway databases. Furthermore, we showed how the ComPath web application can complement enrichment approaches to elucidate and prioritize pathways and genes related to interesting biological phenomenon. Finally, we hope that the implementation of a curation ecosystem and the first mapping efforts conducted in this work pave the way towards unifying the pathway knowledge.

Methods

ComPath framework

At its core, ComPath is a framework for integrating pathway and gene set databases. We defined a set of guidelines for implementing wrappers around the processes of downloading data, transforming it into a common data model, and making queries. These guidelines are encoded in an abstract class with the Python programming language such that new plugins can be quickly implemented for new resources. Each implementation must have a mapping between genes and pathways as well as functions for exporting pathways as gene sets, performing pathway enrichment analysis, and performing reasoning/inference over pathway hierarchies.

Compath plugins

We implemented plugins for four major public pathway databases: KEGG, Reactome, WikiPathways, and MSigDB.^1,2,3,4 They can be used individually as a way of extracting, updating, and exploring the pathways contained within the database. Additionally, they can be used jointly in the ComPath web application where the pathways from multiple databases are integrated for their exploration, analysis, and curation.

ComPath web application

The web application was implemented in the Python programming language using the Flask microframework and a suite of its extensions. The compatibility between Flask and the data models defined in all pathway plugins allows the integration and harmonization of the pathway knowledge in an extensible manner. To illustrate the flexibility of ComPath, we have included plugins for the Alzheimer’s disease and Parkinson’s disease gene sets associated with disease-specific mechanisms from NeuroMMSig⁴³ in the public version of the ComPath web (https://compath.scai.fraunhofer.de/).

ComPath leverages a variety of state-of-the-art libraries for visualization and exploration of pathway knowledge. We chose Bootstrap for the design of the website since its responsive design retains full compatibility across all devices. Interactive visualizations are generated using several Javascript libraries, including D3.js, Clustergrammer.js,²⁴ and Cytoscape.js.⁴⁴

We implemented a RESTful API documented with an OpenAPI specification that can be accessed through the ComPath instance released at https://compath.scai.fraunhofer.de/apidocs. The API enables users to programmatically extract mapping information and perform queries using different genes or pathways identifiers.

Code availability

The source code for ComPath and its plugins can be found on GitHub (https://github.com/ComPath and https://github.com/Bio2BEL) under the MIT license. Both the plugins and the web application can be installed with PyPI (https://pypi.org), the main packaging system for Python. Furthermore, we have included a Dockerfile to enable reproducing the ComPath environment with Docker (https://www.docker.com/). Finally, documentation is included in each GitHub repository and it is also accessible at Read the Docs (https://readthedocs.org).

Estimating pathway similarity

While a variety of indices (e.g., Jaccard, Sørensen–Dice, Tversky) have been used to assess the similarity between sets, the Szymkiewicz-Simpson coefficient (Eq. 1) is most appropriate for comparing sets widely varying in size. Similarly to previous studies, we have chosen this index to not only calculate pathway similarity but also reveal contained pathways (i.e., when most of the nodes from a small pathway are in a larger pathway) to indicate potential hierarchical relationships.^10,45,46,47

$$S_{\left( {X,Y} \right)} = \frac{{\left| {X\mathop { \cap }\nolimits Y} \right|}}{{min\left( {\left| X \right|,\left| Y \right|} \right)}}$$

Equation 1. The Szymkiewicz-Simpson coefficient calculates the similarity between two sets (X and Y) where 0 ≤ S ≤ 1. The similarity is the size of the intersection of the two sets divided by the size of the smaller.

Curation of pathway mappings

Here, we describe a semi-automatic curation procedure we used in order to systematically generate equivalency and hierarchical mappings between the human pathways originating from KEGG, Reactome, and WikiPathways. Here, it is important to note that we have only focused on generating mappings for the pathways originating from each of the three resources, not their imported pathways from other databases (e.g., WikiPathways imported Reactome pathways that are evidently equivalent to the ones in Reactome). First, we define two types of mappings:

1.
equivalentTo. An undirected relationship denoting both pathways refer to the same biological process. The requirements for this relationship are:
- Scope: both pathways represent the same biological pathway information.
- Similarity: both pathways must share at minimum of one overlapping gene.
- Context: both pathways should take place in the same context (e.g., cell line, physiology).

2.
isPartOf. A directed relationship denoting the hierarchical relationship between the pathway 1 (child) and 2 (parent). The requirements are:
- Subset scope: the subject (pathway 1) is a subset of pathway 2 (e.g., reactome pathway hierarchy).
- Similarity: same as above.
- Context: same as above.

We generated all possible mappings between pathways in each database (KEGG-WikiPathways, KEGG-Reactome, and WikiPathways-Reactome) and prioritized them based on the follow two independent metrics that have been proposed to calculate pathway similarity:¹⁰

1.
Lexical similarity between each pair of pathways' names was calculated using the Levenshtein distance.⁴⁸
2.
Content similarity between each pair of pathways' genes was calculated using the previously described Szymkiewicz-Simpson coefficient.

After prioritization, our three curators from different areas of expertize (neuroscience, medicine, and biology) independently evaluated both similarities and the scope and context included in the pathway descriptions to assign the mapping types and to remove false positives. Furthermore, we investigated possible intra-database mappings within KEGG and WikiPathways since these resources do not yet contain hierarchical relationships. Finally, our curators combined the results and re-evaluated them to generate a consensus mapping file. It is available at https://github.com/ComPath/resources under the MIT License.

Change history

07 March 2019
The original version of this Article had an incorrect Article number of 3, an incorrect Volume of 5 and an incorrect Publication year of 2019. These errors have now been corrected in the PDF and HTML versions of the Article.

References

Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y. & Morishima, K. KEGG: New perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 45(D1), D353–D361 (2016).
Article Google Scholar
Fabregat, A. et al. The reactome pathway knowledgebase. Nucleic Acids Res. 46(D1), D649–D655 (2017).
Article Google Scholar
Slenter, D. N. et al. WikiPathways: A multifaceted pathway database bridging metabolomics to other omics research. Nucleic Acids Res. 46(D1), D661–D667 (2017).
Article Google Scholar
Liberzon, A. et al. Molecular signatures database (MSigDB) 3.0. Bioinformatics 27, 1739–1740 (2011).
Article CAS Google Scholar
Krämer, A., Green, J., Pollard, J. Jr & Tugendreich, S. Causal analysis approaches in ingenuity pathway analysis. Bioinformatics 30, 523–530 (2013).
Article Google Scholar
Kutmon, M. et al. WikiPathways: Capturing the full diversity of pathway knowledge. Nucleic Acids Res. 44(D1), D488–D494 (2015).
Article Google Scholar
Demir, E. et al. The BioPAX community standard for pathway data sharing. Nat. Biotechnol. 28, 935 (2010).
Article CAS Google Scholar
Hucka, M. et al. The systems biology markup language (SBML): A medium for representation and exchange of biochemical network models. Bioinformatics 19, 524–531 (2003).
Article CAS Google Scholar
Cerami, E. G. et al. Pathway Commons, a web resource for biological pathway data. Nucleic Acids Res. 39, D685–D690 (2010).
Article Google Scholar
Belinky, F., et al. PathCards: Multi-source consolidation of human biological pathways. Database, 2015, bav006 (2015).
Yamada, T. et al. iPath2. 0: Interactive pathway explorer. Nucleic Acids Res. 39(suppl_2), W412–W415 (2011).
Article CAS Google Scholar
Perfetto, L. et al. SIGNOR: A database of causal relationships between biological entities. Nucleic Acids Res. 44(D1), D548–D554 (2015).
Article Google Scholar
Petri, V. et al. The pathway ontology–updates and applications. J. Biomed. Semantics. 5, 7 (2014).
Article Google Scholar
Iyappan, A. et al. Towards a pathway inventory of the human brain for modeling disease mechanisms underlying neurodegeneration. J. Alzheimer's. Dis. 52, 1343–1360 (2016).
Article Google Scholar
Doderer, M. S. et al. Pathway Distiller-multisource biological pathway consolidation. BMC Genomics 13, S18 (2012).
Article Google Scholar
Daniels, K., and Giraud-Carrier, C. Learning the threshold in hierarchical agglomerative clustering. In 5th International Conference on Machine Learning and Applications, 2006. ICMLA'06. (pp. 270–278). IEEE (2006).
Khatri, P., Sirota, M. & Butte, A. J. Ten years of pathway analysis: current approaches and outstanding challenges. PLoS Comput. Biol. 8, e1002375 (2012).
Article CAS Google Scholar
Kuleshov, M. V. et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 44(W1), W90–W97 (2016).
Article CAS Google Scholar
Cary, M. P., Bader, G. D. & Sander, C. Pathway information for systems biology. FEBS Lett. 579, 1815–1820 (2005).
Article CAS Google Scholar
Subramanian et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).
Article CAS Google Scholar
Bauer-Mehren, A., Furlong, L. I. & Sanz, F. Pathway databases and tools for their exploitation: Benefits, current limitations and challenges. Mol. Syst. Biol. 5, 290 (2009).
Article Google Scholar
Chowdhury, S. & Sarkar, R. R. Comparison of human cell signaling pathway databases—evolution, drawbacks and challenges. Database 2015, bau126 (2015).
Article Google Scholar
Stobbe, M. D., Houten, S. M., Jansen, G. A., van Kampen, A. H. & Moerland, P. D. Critical assessment of human metabolic pathway databases: a stepping stone for future integration. BMC Syst. Biol. 5, 165 (2011).
Article Google Scholar
Fernández, N. F. et al. Clustergrammer, a web-based heatmap visualization and analysis tool for high-dimensional biological data. Sci. Data 4, 170151 (2017).
Article Google Scholar
Reimand, J. et al. g: Profiler—a web server for functional interpretation of gene lists (2016 update). Nucleic Acids Res. 44(W1), W83–W89 (2016).
Article CAS Google Scholar
Pathan, M. et al. FunRich: an open access standalone functional enrichment and interaction network analysis tool. Proteomics 15.15, 2597–2601 (2015).
Article Google Scholar
Huang, W. et al. The DAVID Gene Functional Classification Tool: a novel biological module-centric algorithm to functionally analyze large gene lists. Genome Biol. 8, R183 (2007).
Article Google Scholar
Brás, J., Guerreiro, R. & Hardy, J. SnapShot: genetics of Parkinson’s disease. Cell 160, 570–570 (2015).
Article Google Scholar
Benjamini, Y. & Yekutieli, D. The control of the false discovery rate in multiple testing under dependency. Ann. Stat 29, 1165–1188 (2001).
Article Google Scholar
Minadakis, G., et al. PathwayConnector: Finding complementary pathways to enhance functional analysis, Bioinformatics, 10.1093/bioinformatics/bty693 (2018).
Perrett, R. M., Alexopoulou, Z. & Tofaris, G. K. The endosomal pathway in Parkinson's disease. Mol. Cell. Neurosci. 66, 21–28 (2015).
Article CAS Google Scholar
Rajagopalan, S., Rane, A., Chinta, S. J. & Andersen, J. K. Regulation of ATP13A2 via PHD2-HIF1α signaling is critical for cellular iron homeostasis: implications for Parkinson's disease. J. Neurosci. 36, 1086–1095 (2016).
Article CAS Google Scholar
Gusdon, A. M., Zhu, J., Van Houten, B. & Chu, C. T. ATP13A2 regulates mitochondrial bioenergetics through macroautophagy. Neurobiol. Dis. 45, 962–972 (2012).
Article CAS Google Scholar
Holemans, T. et al. A lipid switch unlocks Parkinson’s disease-associated ATP13A2. Proc. Natl Acad. Sci. USA 112, 9040–9045 (2015).
Article CAS Google Scholar
Obulesu, M. & Lakshmi, M. J. Apoptosis in Alzheimer’s disease: An understanding of the physiology, pathology and therapeutic avenues. Neurochem. Res. 39, 2301–2312 (2014).
Article CAS Google Scholar
Tatton, W. G., Chalmers-Redman, R., Brown, D. & Tatton, N. Apoptosis in Parkinson's disease: signals for neuronal degradation. Ann. Neurol. 53(S3), S61–70, https://doi.org/10.1002/(ISSN)1531-8249 (2003).
Article CAS PubMed Google Scholar
Wadi, L. et al. Impact of outdated gene annotations on pathway enrichment analysis. Nat. Methods 13, 705 (2016).
Article CAS Google Scholar
Ihnatova, I., Popovici, V. & Budinska, E. A critical comparison of topology-based pathway analysis methods. PLoS One 13, e0191154 (2018).
Article Google Scholar
Bayerlová, M. et al. Comparative study on gene set and pathway topology-based enrichment methods. BMC Bioinformatics 16, 334 (2015).
Article Google Scholar
Bohler, A. et al. Reactome from a WikiPathways perspective. PLoS Comput. Biol. 12, e1004941 (2016).
Article Google Scholar
Acevedo, A., Duran, C., Ciucci, S., Gerl, M., and Cannistraci, C. V. LIPEA: Lipid Pathway Enrichment Analysis. bioRxiv, https://doi.org/10.1101/274969 (2018).
Wishart, D. S. et al. HMDB 4.0: The human metabolome database for 2018. Nucleic Acids Res. 46(D1), D608–D617 (2017).
Article Google Scholar
Domingo-Fernández, D. et al. Multimodal mechanistic signatures for neurodegenerative diseases (NeuroMMSig): a web server for mechanism enrichment. Bioinformatics 33, 3679–3681 (2017).
Article Google Scholar
Franz, M. et al. Cytoscape. js: A graph theory library for visualisation and analysis. Bioinformatics 32, 309–311 (2015).
PubMed PubMed Central Google Scholar
Chen, Y. A. et al. Integrated pathway clusters with coherent biological themes for target prioritisation. PLoS One 9, e99030 (2014).
Article Google Scholar
Pita-Juarez, Y. et al. The pathway coexpression network: Revealing pathway relationships. PLoS Comput. Biol. 14, e1006042 (2018).
Article Google Scholar
Katiyar, A., Sharma, S., Singh, T. P. & Kaur, P. Identification of shared molecular signatures indicate the susceptibility of endometriosis to multiple sclerosis. Front. Genet. 9, 42 (2018).
Article Google Scholar
Levenshtein, V. I. Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys. Dokl. 10, 707–710 (1966).
Google Scholar

Download references

Acknowledgements

This work was supported by the EU/EFPIA Innovative Medicines Initiative Joint Undertaking under AETIONOMY [grant number 115568], resources of which are composed of financial contribution from the European Union's Seventh Framework Programme (FP7/2007-2013) and EFPIA companies in kind contribution.

Author information

Authors and Affiliations

Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, 53754, Sankt Augustin, Germany
Daniel Domingo-Fernández, Charles Tapley Hoyt, Josep Marín-Llaó & Martin Hofmann-Apitius
Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, 53115, Bonn, Germany
Daniel Domingo-Fernández, Charles Tapley Hoyt & Martin Hofmann-Apitius
Faculty of Medicine and Health Sciences, University of Oviedo, 33006, Oviedo, Spain
Carlos Bobis-Álvarez
Rovira i Virgili University, 43003, Tarragona, Spain
Josep Marín-Llaó

Authors

Daniel Domingo-Fernández
View author publications
You can also search for this author in PubMed Google Scholar
Charles Tapley Hoyt
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Bobis-Álvarez
View author publications
You can also search for this author in PubMed Google Scholar
Josep Marín-Llaó
View author publications
You can also search for this author in PubMed Google Scholar
Martin Hofmann-Apitius
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.H.A. and D.D.F. conceived and designed the study. D.D.F implemented ComPath and the pathway database plugins with help from C.T.H. D.D.F, C.B.A, and J.M.L curated the pathway mappings. D.D.F and C.T.H wrote the paper. M.H.A. reviewed the content.

Corresponding author

Correspondence to Daniel Domingo-Fernández.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Supplementary Text

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Domingo-Fernández, D., Hoyt, C.T., Bobis-Álvarez, C. et al. ComPath: an ecosystem for exploring, analyzing, and curating mappings across pathway databases. npj Syst Biol Appl 4, 43 (2018). https://doi.org/10.1038/s41540-018-0078-8

Download citation

Received: 05 July 2018
Revised: 31 October 2018
Accepted: 02 November 2018
Published: 13 December 2018
DOI: https://doi.org/10.1038/s41540-018-0078-8

This article is cited by

Extending inherited metabolic disorder diagnostics with biomarker interaction visualizations
- Denise N. Slenter
- Irene M. G. M. Hemel
- Laura K. M. Steinbusch
Orphanet Journal of Rare Diseases (2023)
Establishing a consensus for the hallmarks of cancer based on gene ontology and pathway annotations
- Yi Chen
- Fons. J. Verbeek
- Katherine Wolstencroft
BMC Bioinformatics (2021)
A method for the rational selection of drug repurposing candidates from multimodal knowledge harmonization
- Bruce Schultz
- Andrea Zaliani
- Martin Hofmann-Apitius
Scientific Reports (2021)
PathMe: merging and exploring mechanistic pathway knowledge
- Daniel Domingo-Fernández
- Sarah Mubeen
- Martin Hofmann-Apitius
BMC Bioinformatics (2019)