DrugMechDB: A Curated Database of Drug Mechanisms

Gonzalez-Cavazos, Adriana Carolina; Tanska, Anna; Mayers, Michael; Carvalho-Silva, Denise; Sridharan, Brindha; Rewers, Patrick A.; Sankarlal, Umasri; Jagannathan, Lakshmanan; Su, Andrew I.

doi:10.1038/s41597-023-02534-z

Download PDF

Data Descriptor
Open access
Published: 16 September 2023

DrugMechDB: A Curated Database of Drug Mechanisms

Adriana Carolina Gonzalez-Cavazos ORCID: orcid.org/0000-0001-9615-7157¹^na1,
Anna Tanska¹^na1,
Michael Mayers¹,
Denise Carvalho-Silva¹,
Brindha Sridharan¹,
Patrick A. Rewers¹,
Umasri Sankarlal¹,
Lakshmanan Jagannathan¹ &
…
Andrew I. Su ORCID: orcid.org/0000-0002-9859-4104¹

Scientific Data volume 10, Article number: 632 (2023) Cite this article

2454 Accesses
Metrics details

Subjects

Abstract

Computational drug repositioning methods have emerged as an attractive and effective solution to find new candidates for existing therapies, reducing the time and cost of drug development. Repositioning methods based on biomedical knowledge graphs typically offer useful supporting biological evidence. This evidence is based on reasoning chains or subgraphs that connect a drug to a disease prediction. However, there are no databases of drug mechanisms that can be used to train and evaluate such methods. Here, we introduce the Drug Mechanism Database (DrugMechDB), a manually curated database that describes drug mechanisms as paths through a knowledge graph. DrugMechDB integrates a diverse range of authoritative free-text resources to describe 4,583 drug indications with 32,249 relationships, representing 14 major biological scales. DrugMechDB can be employed as a benchmark dataset for assessing computational drug repositioning models or as a valuable resource for training such models.

The OREGANO knowledge graph for computational drug repurposing

Article Open access 06 December 2023

Building a knowledge graph to enable precision medicine

Article Open access 02 February 2023

The heterogeneous pharmacological medical biochemical network PharMeBINet

Article Open access 11 July 2022

Background & Summary

Drug repositioning, the identification of novel uses of existing therapies, has become an increasingly attractive strategy to accelerate drug development¹. By leveraging available genomics and biomedical domains, computational drug repositioning models have emerged as an unprecedented opportunity to analyze large amounts of data, reducing the time and effort required to identify repositioning candidates.

Computational repositioning models frequently rely on drug-drug and or disease-disease similarity^2,3. However, the complex and contextual biological associations that underlie the relationship between a drug and a disease often require a more sophisticated explanation. To address this, biomedical knowledge graphs have emerged as a powerful tool capable of capturing biological associations that provide a more comprehensive understanding of the link between a drug and a disease⁴.

Biomedical knowledge graphs consist of nodes representing biological concepts (such as genes, drugs, diseases, and pathways) and edges describing their relationship (such as drugs treating diseases, or diseases being associated with genes)⁴. Repositioning methods based on knowledge graphs leverage the biological associations captured on the network to provide supporting evidence for the model prediction. This is typically achieved by identifying subsets of reasoning chains or subgraphs within the larger network, providing a mechanistic rationale for why a particular drug might be effective against a particular disease, despite the absence of pre-existing evidence to validate the association⁵.

However, one major challenge in determining the plausibility of the supporting evidence provided by biomedical knowledge graphs is the absence of a gold standard, well-defined collection of drug mechanisms. Such a reference point is necessary to evaluate the mechanistic accuracy of predictions made by repositioning models. While validation by domain experts is an alternative approach, it is a laborious and resource-intensive process that demands significant expertise.

Current efforts to construct biomedical networks integrate diverse knowledge bases^5,6,7,8 or extract knowledge from literature using natural language processing techniques^9,10,11. However, there are several challenges in creating an accurate and comprehensive knowledge graph that serves as a benchmark for repositioning discoveries. They often lack contextual information, not providing enough information about the relationship between a drug and a disease. Moreover, semantic interoperability is not present in high-quality, where concepts and terminologies within the network are unclear.

To fill this gap, we created Drug Mechanism Database (DrugMechDB), a manually curated database of drug mechanisms expressed as paths through a biomedical knowledge graph. In this work, we present our first complete version of DrugMechDB, comprising 5,666 mechanistic paths that explain 4,583 indications. Each record is derived from free-text descriptions, where each captured concept is normalized to a concept type and mapped to an identifier. We provide a detailed description of the information captured by mechanistic paths, elucidating expressiveness of the database. We assess the quality of association by leveraging an external biomedical knowledge graph. The detailed information contained within DrugMechDB serves as a useful community reference for the development and evaluation of machine learning drug repositioning models. Researches can leverage mechanistic paths of DrugMechDB to enhance the accuracy and effectiveness of their algorithms, leading to more informed decisions.

Methods

In DrugMechDB, each curated indication is depicted as a directed graph (Fig. 1). Here, we provide a detailed explanation of the data resources utilized and the curation process undertaken to build DrugMechDB.

Data sources

DrugMechDB was constructed considering drug-disease indications from the DrugCentral database, using the version downloaded on September 18, 2020¹². The main source for curation arises from either the Mechanism of Action section from DrugBank¹³, or the Description section within Inxight Drugs¹⁴. Other resources included review articles, GeneOntology^15,16, UniProt¹⁷, Reactome¹⁸, and well-sources Wikipedia articles¹⁹, which references were authenticated by curators. Primary literature sources containing experimental results were excluded, ensuring that only highly curated and high-confidence information was included.

Data model

DrugMechDB provides researchers with a consistent and structured information source on drug mechanisms. To achieve this, we adopted the Biolink Model (version 1.3.0)²⁰. The Biolink Model is a standardized hierarchy of biomedical entity classes that serves as a universal framework for biomedical data representation and linkage²¹. It encompasses a wide range of entity types such as genes, proteins, diseases, drugs, and biological processes, and defines the predicates that describe the relationships between these entity types.

The standardization of data in DrugMechDB to the Biolink Model enables the mapping of concepts and relationships to a common vocabulary, thus allowing interoperability between various data sources. Therefore, researchers can easily combine data from DrugMechDB with other biomedical data sources that also employ the same data model, enabling researchers to perform comprehensive analyses and gain new insights into drug mechanisms of action. A list of the DrugMechDB concepts and corresponding relationships is found in Table 1.

Table 1 DrugMechDB concept types.

Full size table

Path curation

While free-text descriptions offer a comprehensive narrative of a drug’s mechanism, they can sometimes include information that is not directly relevant to the mechanism of action. Consequently, the process of defining the most suitable relationships that describe a drug’s action can be subjective, resulting in inconsistent annotations. To ensure consistency, accuracy, and clarity among path representations of DrugMechDB records, we established a formal curation guide. Briefly, we ensured to maintain the order of interactions to reflect cause and effect between two concepts, representing the sequence of events or influences. To streamline the paths and eliminate unnecessary complexity, we removed any information that did not significantly contribute to the overall understanding of the drug’s action. Additionally, when multiple related concepts were involved in a sequence of interactions, we summarized them into a single all-encompassing concept, allowing for a more concise and cohesive representation of the drug’s mechanism, reducing redundancy, and improving the clarity of the path.

Lastly, to enhance standardization and minimize inconsistencies in vocabulary conventions, we relied on the Node Normalization service (version 2.1.1)²². Each node recorded in DrugMechDB was mapped to the preferred CURIE prefix and label, along with the semantic type defined by the Biolink Model.

Data Records

The first completed DrugMechDB version (2.0.1)²³ captures 4,583 curated indications between 1,580 drugs and 744 diseases. DrugMechDB is a knowledge graph with 14 types of nodes and 71 types of directed edges. Currently, it captures 32,588 nodes, and 32,249 edges. We provide a breakdown of the number of edges by concept type in Table 1.

The number of nodes contained in DrugMechDB by concept type is shown in Fig. 2a, the ‘BiologicalProcess’ concept type appears most frequently as a node on the graph, comprising 24.55 % of the total nodes. Among the total 725 meta-edges, the most common connection occurs between a ‘Protein’ to a ‘BiologicalProcess’ concept type, linked by a ‘positively regulates’ edge type, accounting for 11.29 % of the total meta-edges (Fig. 2b). Each indication is explained through a mechanistic path, a sequence of nodes, and relationships. The current version of DrugMechDB captures a collection of 5,666 curated mechanistic paths. These paths are grouped into 297 distinct types based on the sequence of concept types they encompass (Fig. 2c).

The complexity of interactions underlying in drug-disease associations can lead to a wide variation in the number of nodes and edges. Figure 3a,b depict the distribution of the number of nodes and edges captured in DrugMechDB indications, respectively. Some records are relatively simple, with only a few nodes and edges, while others are much more complex, with many interconnected nodes and edges, reflecting the complexity nature of the biological connections. Certain drugs exert their therapeutic effects by engaging in multiple simultaneous interactions. This can entail blocking multiple targets or influencing multiple pathways. In DrugMechDB, such situations are represented by branching paths (Fig. 3c).

All curated records in DrugMechDB are structured in a standardized format, located within the file indication _ paths.json. Each record is represented as a directed graph with the keys: ‘graph’, ‘links’, ‘nodes’, and ‘reference’ (Fig. 1). Indication information, including the drug and disease names and their external identifiers, is captured within ‘graph’ key. Here, we provide a ‘_ id’ value, which is a unique identifier of each record. The relationships and concepts associated to the mechanistic paths of each record are defined within the ‘links’ key. In this key, the ‘source’ and ‘target’ identifiers of the concepts are provided, along with a ‘key’ field that indicates the specific type of relationship between the two nodes. Further information about the concepts in the graph of each record is described within the ‘nodes’ key. Here, each node contains the fields ‘id’, ‘name’, and ‘label’ corresponding to the external identifier, the concept’s name, and the type of concept respectively. Lastly, the ‘reference’ key provides a hyperlink to the data source(s) from which the record was curated.

Technical Validation

Systematic validation of DrugMechDB associations

Validating the reliability of a knowledge graph is a crucial step that ensures the correctness of the captured information. In this work, we assessed the accuracy of captured DrugMechDB associations by comparing them to existing data sources. For this, we leverage an external biomedical knowledge graph: Mechanistic Repositioning Network (MechRepoNet)²⁴.

Briefly, MechRepoNet is a comprehensive biomedical knowledge graph that was constructed by integrating 18 different data sources and using Biolink Model for standardization. Given that MechRepoNet encompasses a wider network that spans various domains, we employed it as an external benchmark for verifying the plausibility of the associations recorded in DrugMechDB.

Evaluating association types between concept types (ignoring edge predicates), we found that 2,924 (28.71%) of the 10,184 unique associations captured in DrugMechDB are also contained within MechRepoNet. To demonstrate that DrugMechDB associations are broadly consistent with the knowledge captured in MechRepoNet, we conducted a bootstrapping analysis. For each DrugMechDB association type, nonparametric bootstrapping was applied to sample simulated association types (with replacement) to calculate the percentage of matching with MechRepoNet. This procedure was repeated 1,000 times to construct a percentage distribution from which the mean and 99 % CI were calculated. The p-value was calculated as the fraction of the distribution in which the simulated percentage of matching was greater than or equal to the observed percentage. Results in Table 2 show that the average p-value of the ten most frequent association types is less than 0.001, demonstrating that observed overlapping between DrugMechDB and the broader knowledge captured by MechRepoNet is unlikely to occur by chance.

Table 2 Validation of the ten most frequent DugMechDB association types.

Full size table

The association type ‘BiologicalProcess’-‘BiologicalProcess’ has the least overlap among the most frequent DrugMechDB association types, highlighting that MechRepoNet does not cover all curated association types of DrugMechDB. To incorporate the missing information in MechRepoNet, we propose using DrugMechDB as a roadmap, helping to prioritize the most significant relationships involved in drug mechanisms and facilitating the integration of biomedical sources.

In summary, DrugMechDB is a comprehensive resource that provides human interpretable explanations when producing computational repositioning predictions, it has the potential to help domain experts to better assess whether a model’s candidate provides enough biological evidence. We believe that DrugMechDB offers several advantages. First, it serves as a useful resource for researchers looking to understand drug pharmacodynamics. Second, it is a valuable training data set that can be incorporated into drug repositioning models that focus on providing supporting plausible reasoning chains. Lastly and as described above, DrugMechDB functions as a roadmap for knowledge graph expansion, helping to prioritize biological associations that most commonly appear in curated drug mechanisms.

Usage Notes

DrugMechDB provides structured information about drug mechanisms based on a wide range of primary and secondary sources. We believe that DrugMechDB will be a valuable resource for a wide range of computational analyses, including, for example, the identification of drug repositioning candidates. While we are confident in the overall accuracy of the DrugMechDB as a data set for training and/or evaluating machine learning models, we encourage users to critically assess any individual records or assertions used in downstream analyses. Variance could be due to a wide variety of factors, including (but not limited to) differences in data modeling, multiple possible mechanisms described in the literature, and/or errors in structuring knowledge in our curation process.

Code availability

The DrugMechDB project website is at https://sulab.github.io/DrugMechDB/. The code to reproduce results, along with curation guidelines, is available in DrugMechDB GitHub repository at https://github.com/SuLab/DrugMechDB/tree/2.0.1. All relevant files are hosted at https://doi.org/10.5281/zenodo.8139357²³. Additionally, contributions of curated mechanistic paths can be done by pull request to the file submission.yaml at SuLab/DrugMechDB/blob/main/SubmissionGuide.md.

References

Pushpakom, S. et al. Drug repurposing: progress, challenges and recommendations. Nature reviews Drug discovery 18, 41–58 (2019).
Article CAS PubMed Google Scholar
Li, J. et al. A survey of current trends in computational drug repositioning. Briefings in bioinformatics 17, 2–12 (2016).
Article PubMed Google Scholar
Li, J. & Lu, Z. A new method for computational drug repositioning using drug pairwise similarity. In 2012 IEEE international conference on bioinformatics and biomedicine, 1–4 (IEEE, 2012).
Nicholson, D. N. & Greene, C. S. Constructing knowledge graphs and their biomedical applications. Computational and structural biotechnology journal 18, 1414–1428 (2020).
Article PubMed PubMed Central Google Scholar
Himmelstein, D. S. et al. Systematic integration of biomedical knowledge prioritizes drugs for repurposing. Elife 6, e26726 (2017).
Article PubMed PubMed Central Google Scholar
Santos, A. et al. A knowledge graph to interpret clinical proteomics data. Nature Biotechnology 40, 692–702 (2022).
Article CAS PubMed PubMed Central Google Scholar
Yu, Y. et al. PreMedKB: an integrated precision medicine knowledgebase for interpreting relationships between diseases, genes, variants and drugs. Nucleic acids research 47, D1090–D1101 (2019).
Article CAS PubMed Google Scholar
Zhu, Y. et al. Knowledge-driven drug repurposing using a comprehensive drug knowledge graph. Health Informatics Journal 26, 2737–2750 (2020).
Article PubMed Google Scholar
Ernst, P., Siu, A. & Weikum, G. KnowLife: a versatile approach for constructing a large knowledge graph for biomedical sciences. BMC bioinformatics 16, 1–13 (2015).
Article CAS Google Scholar
Percha, B. & Altman, R. B. A global network of biomedical relationships derived from text. Bioinformatics 34, 2614–2624 (2018).
Article CAS PubMed PubMed Central Google Scholar
Yuan, J. et al. Constructing biomedical domain-specific knowledge graph with minimum supervision. Knowledge and Information Systems 62, 317–336 (2020).
Article Google Scholar
Ursu, O. et al. Drugcentral 2018: an update. Nucleic acids research 47, D963–D970 (2019).
Article CAS PubMed Google Scholar
Wishart, D. S. et al. DrugBank 5.0: a major update to the drugbank database for 2018. Nucleic acids research 46, D1074–D1082 (2018).
Article CAS PubMed Google Scholar
Siramshetty, V. B. et al. Ncats inxight drugs: a comprehensive and curated portal for translational research. Nucleic Acids Research 50, D1307–D1316 (2022).
Article CAS PubMed Google Scholar
Ashburner, M. et al. Gene Ontology: tool for the unification of biology. Nature genetics 25, 25–29 (2000).
Article CAS PubMed PubMed Central Google Scholar
Aleksander, S. A. et al. The Gene Ontology knowledgebase in 2023. Genetics 224, iyad031 (2023).
Article PubMed PubMed Central Google Scholar
Uniprot. the universal protein knowledgebase in 2023. Nucleic Acids Research 51, D523–D531 (2023).
Article Google Scholar
Jassal, B. et al. The reactome pathway knowledgebase. Nucleic acids research 48, D498–D503 (2020).
CAS PubMed Google Scholar
Vrandečić, D. Wikidata: A new platform for collaborative data collection. In Proceedings of the 21st international conference on world wide web, 1063–1064 (2012).
Chris, M. et al. biolink-model: 1.3.0 release (v1.3.0). Zenodo, https://doi.org/10.5281/zenodo.3700190 (2020).
Unni, D. R. et al. Biolink Model: A universal schema for knowledge graphs in clinical, biomedical, and translational science. Clinical and translational science 15, 1848–1855 (2022).
Article PubMed PubMed Central Google Scholar
Node Normalization. https://github.com/TranslatorSRI/NodeNormalization (2023).
Adriana, G-C. et al. Drug Mechanism Database (DrugMechDB) (2.0.1)., Zenodo, https://doi.org/10.5281/zenodo.8139357 (2023).
Mayers, M. et al. Design and application of a knowledge network for automatic prioritization of drug mechanisms. Bioinformatics 38, 2880–2891 (2022).
Article CAS PubMed PubMed Central Google Scholar
Mungall, C. J., Torniai, C., Gkoutos, G. V., Lewis, S. E. & Haendel, M. A. Uberon, an integrative multi-species anatomy ontology. Genome biology 13, 1–20 (2012).
Article Google Scholar
Diehl, A. D. et al. The Cell Ontology 2016: enhanced content, modularization, and ontology interoperability. Journal of biomedical semantics 7, 1–10 (2016).
Article Google Scholar
Hastings, J. et al. ChEBI in 2016: Improved services and an expanding collection of metabolites. Nucleic acids research 44, D1214–D1219 (2016).
Article CAS PubMed Google Scholar
Paysan-Lafosse, T. et al. InterPro in 2022. Nucleic Acids Research 51, D418–D427 (2023).
Article CAS PubMed Google Scholar
Mistry, J. et al. Pfam: The protein families database in 2021. Nucleic acids research 49, D412–D419 (2021).
Article CAS PubMed Google Scholar
Natale, D. A. et al. Protein Ontology (PRO): enhancing and scaling up the representation of protein entities. Nucleic acids research 45, D339–D346 (2017).
Article CAS PubMed Google Scholar
Köhler, S. et al. The human phenotype ontology in 2021. Nucleic acids research 49, D1207–D1217 (2021).
Article PubMed Google Scholar
Schoch, C. L. et al. NCBI Taxonomy: a comprehensive update on curation, resources and tools. Database 2020 (2020).

Download references

Acknowledgements

This work was supported by funding from the National Center for Advancing Translational Sciences (NCATS) under awards OT2TR003427 and UL1TR002550, and from the National Institutes of Aging (NIA) under award R01AG066750. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Author information

These authors contributed equally: Adriana Carolina Gonzalez-Cavazos, Anna Tanska.

Authors and Affiliations

The Scripps Research Institute, Department of Integrative Structural and Computational Biology, 10550 N Torrey Pines Rd, La Jolla, CA, 92037, USA
Adriana Carolina Gonzalez-Cavazos, Anna Tanska, Michael Mayers, Denise Carvalho-Silva, Brindha Sridharan, Patrick A. Rewers, Umasri Sankarlal, Lakshmanan Jagannathan & Andrew I. Su

Authors

Adriana Carolina Gonzalez-Cavazos
View author publications
You can also search for this author in PubMed Google Scholar
Anna Tanska
View author publications
You can also search for this author in PubMed Google Scholar
Michael Mayers
View author publications
You can also search for this author in PubMed Google Scholar
Denise Carvalho-Silva
View author publications
You can also search for this author in PubMed Google Scholar
Brindha Sridharan
View author publications
You can also search for this author in PubMed Google Scholar
Patrick A. Rewers
View author publications
You can also search for this author in PubMed Google Scholar
Umasri Sankarlal
View author publications
You can also search for this author in PubMed Google Scholar
Lakshmanan Jagannathan
View author publications
You can also search for this author in PubMed Google Scholar
Andrew I. Su
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.T., M.M., B.S., D.C.-S., U.S. and L.J. retrieved and curated indications. A.G.-C., M.M., and P.R. wrote analysis tools and performed the analysis. A.G.-C. and A.S. wrote the manuscript. M.M. and A.S. conceptualized and designed the study. All authors have read and approved the manuscript.

Corresponding author

Correspondence to Andrew I. Su.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Gonzalez-Cavazos, A.C., Tanska, A., Mayers, M. et al. DrugMechDB: A Curated Database of Drug Mechanisms. Sci Data 10, 632 (2023). https://doi.org/10.1038/s41597-023-02534-z

Download citation

Received: 02 May 2023
Accepted: 01 September 2023
Published: 16 September 2023
DOI: https://doi.org/10.1038/s41597-023-02534-z