Abstract
The value of broadening searches for data across multiple repositories has been identified by the biomedical research community. As part of the US National Institutes of Health (NIH) Big Data to Knowledge initiative, we work with an international community of researchers, service providers and knowledge experts to develop and test a data index and search engine, which are based on metadata extracted from various data sets in a range of repositories. DataMed is designed to be, for data, what PubMed has been for the scientific literature. DataMed supports the findability and accessibility of data sets. These characteristics—along with interoperability and reusability—compose the four FAIR principles to facilitate knowledge discovery in today's big data–intensive science landscape.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
A model of service-oriented architecture of e-certification system to support boat registration and site visit inspection to support maritime safety and crew health inspection
Marine Systems & Ocean Technology Open Access 25 September 2023
-
The Translational Data Catalog - discoverable biomedical datasets
Scientific Data Open Access 20 July 2023
-
FAIRness through automation: development of an automated medical data integration infrastructure for FAIR health data in a maximum care university hospital
BMC Medical Informatics and Decision Making Open Access 15 May 2023
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
Purchase on Springer Link
Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
References
Wilkinson, M.D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016).
Collins, F.S. & Tabak, L.A. Policy: NIH plans to enhance reproducibility. Nature 505, 612–613 (2014).
Bourne, P.E. et al. The NIH Big Data to Knowledge (BD2K) initiative. J. Am. Med. Inform. Assoc. 22, 1114 (2015).
Lu, Z. PubMed and beyond: a survey of web tools for searching biomedical literature. Database (Oxford) 2011, baq036 (2011).
Sansone, S.-A. et al. DATS: the data tag suite to enable discoverability of datasets. Sci. Data 4, 170059 (2017).
Noruzi, A. Google Scholar: the new generation of citation indexes. Libri 55, 170–180 (2005).
Hands, A. Microsoft Academic Search—http://academic.research.microsoft.com. Tech. Serv. Q. 29, 251–252 (2012).
Kejariwal, D. & Mahawar, K.K. Is your journal indexed in PubMed? Relevance of PubMed in biomedical scientific literature today. WebmedCentral MISCELLANEOUS 3, WMC003159 (2012).
Huh, S. Journal Article Tag Suite 1.0: National Information Standards Organization standard of journal extensible markup language. Sci. Ed. 1, 99–104 (2014).
Perez-Riverol, Y. et al. Nat. Biotechnol. 35, 406–409 (2017)
Brase, J. in 2009 Fourth International Conference on Cooperation and Promotion of Information Resources in Science and Technology (COINFO 2009) 257–261 (IEEE, 2009).
Chodorow, K. MongoDB: The Definitive Guide (O'Reilly Media, 2013).
Kuć, R. & Rogozinski, M. ElasticSearch Server (Packt Publishing, 2016).
Coll, I.S. & Cruz, J.M.B. Open archives initiative. Protocol for metadata harvesting (OAI-PMH): descripción, funciones y aplicaciones de un protocolo. Prof. Inf. 12, 99–106 (2003).
Richardson, L. & Ruby, S. RESTful Web Services (O'Reilly Media, 2008).
Westbrook, J., Ito, N., Nakamura, H., Henrick, K. & Berman, H.M. PDBML: the representation of archival macromolecular structure data in XML. Bioinformatics 21, 988–992 (2005).
Kiryakov, A., Popov, B., Terziev, I. Manov, D. & Ognyanoff, D. Semantic annotation, indexing, and retrieval. Web Semantics 2, 49–79 (2004).
Haustein, S., Peters, I., Sugimoto, C.R., Thelwall, M. & Larivière, V. Tweeting biomedicine: an analysis of tweets and citations in the biomedical literature. J. Assoc. Inf. Sci. Technol. 65, 656–669 (2014).
Acknowledgements
This project is funded by grant U24AI117966 from NIAID, NIH, as part of the BD2K program. The co-authors, who are the lead investigators and chairs/co-chairs of the core activities, thank all contributors to the bioCADDIE consortium and list them in the Supplementary Note in alphabetical order within each activity group (each name appears only once even though many people participated in different activities).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Text and Figures
Supplementary Table 1 and Supplementary Note (DOCX 39 kb)
Rights and permissions
About this article
Cite this article
Ohno-Machado, L., Sansone, SA., Alter, G. et al. Finding useful data across multiple biomedical data repositories using DataMed. Nat Genet 49, 816–819 (2017). https://doi.org/10.1038/ng.3864
Published:
Issue Date:
DOI: https://doi.org/10.1038/ng.3864
This article is cited by
-
FAIRness through automation: development of an automated medical data integration infrastructure for FAIR health data in a maximum care university hospital
BMC Medical Informatics and Decision Making (2023)
-
Addressing barriers in FAIR data practices for biomedical data
Scientific Data (2023)
-
The Translational Data Catalog - discoverable biomedical datasets
Scientific Data (2023)
-
Developing a standardized but extendable framework to increase the findability of infectious disease datasets
Scientific Data (2023)
-
A model of service-oriented architecture of e-certification system to support boat registration and site visit inspection to support maritime safety and crew health inspection
Marine Systems & Ocean Technology (2023)