Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Finding useful data across multiple biomedical data repositories using DataMed


The value of broadening searches for data across multiple repositories has been identified by the biomedical research community. As part of the US National Institutes of Health (NIH) Big Data to Knowledge initiative, we work with an international community of researchers, service providers and knowledge experts to develop and test a data index and search engine, which are based on metadata extracted from various data sets in a range of repositories. DataMed is designed to be, for data, what PubMed has been for the scientific literature. DataMed supports the findability and accessibility of data sets. These characteristics—along with interoperability and reusability—compose the four FAIR principles to facilitate knowledge discovery in today's big data–intensive science landscape.

Your institute does not have access to this article

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: Data sources have various metadata specifications, which undergo ingestion into the common DATS model, whose metadata elements are used for indexing and DataMed searches.
Figure 2: Community input to the Data Discovery Index Consortium.


  1. Wilkinson, M.D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016).

    Article  Google Scholar 

  2. Collins, F.S. & Tabak, L.A. Policy: NIH plans to enhance reproducibility. Nature 505, 612–613 (2014).

    Article  Google Scholar 

  3. Bourne, P.E. et al. The NIH Big Data to Knowledge (BD2K) initiative. J. Am. Med. Inform. Assoc. 22, 1114 (2015).

    Article  Google Scholar 

  4. Lu, Z. PubMed and beyond: a survey of web tools for searching biomedical literature. Database (Oxford) 2011, baq036 (2011).

    Article  Google Scholar 

  5. Sansone, S.-A. et al. DATS: the data tag suite to enable discoverability of datasets. Sci. Data 4, 170059 (2017).

    Article  Google Scholar 

  6. Noruzi, A. Google Scholar: the new generation of citation indexes. Libri 55, 170–180 (2005).

    Article  Google Scholar 

  7. Hands, A. Microsoft Academic Search— Tech. Serv. Q. 29, 251–252 (2012).

    Article  Google Scholar 

  8. Kejariwal, D. & Mahawar, K.K. Is your journal indexed in PubMed? Relevance of PubMed in biomedical scientific literature today. WebmedCentral MISCELLANEOUS 3, WMC003159 (2012).

    Google Scholar 

  9. Huh, S. Journal Article Tag Suite 1.0: National Information Standards Organization standard of journal extensible markup language. Sci. Ed. 1, 99–104 (2014).

    Article  Google Scholar 

  10. Perez-Riverol, Y. et al. Nat. Biotechnol. 35, 406–409 (2017)

    CAS  Article  Google Scholar 

  11. Brase, J. in 2009 Fourth International Conference on Cooperation and Promotion of Information Resources in Science and Technology (COINFO 2009) 257–261 (IEEE, 2009).

    Google Scholar 

  12. Chodorow, K. MongoDB: The Definitive Guide (O'Reilly Media, 2013).

    Google Scholar 

  13. Kuć, R. & Rogozinski, M. ElasticSearch Server (Packt Publishing, 2016).

    Google Scholar 

  14. Coll, I.S. & Cruz, J.M.B. Open archives initiative. Protocol for metadata harvesting (OAI-PMH): descripción, funciones y aplicaciones de un protocolo. Prof. Inf. 12, 99–106 (2003).

    Google Scholar 

  15. Richardson, L. & Ruby, S. RESTful Web Services (O'Reilly Media, 2008).

    Google Scholar 

  16. Westbrook, J., Ito, N., Nakamura, H., Henrick, K. & Berman, H.M. PDBML: the representation of archival macromolecular structure data in XML. Bioinformatics 21, 988–992 (2005).

    CAS  Article  Google Scholar 

  17. Kiryakov, A., Popov, B., Terziev, I. Manov, D. & Ognyanoff, D. Semantic annotation, indexing, and retrieval. Web Semantics 2, 49–79 (2004).

    Article  Google Scholar 

  18. Haustein, S., Peters, I., Sugimoto, C.R., Thelwall, M. & Larivière, V. Tweeting biomedicine: an analysis of tweets and citations in the biomedical literature. J. Assoc. Inf. Sci. Technol. 65, 656–669 (2014).

    Article  Google Scholar 

Download references


This project is funded by grant U24AI117966 from NIAID, NIH, as part of the BD2K program. The co-authors, who are the lead investigators and chairs/co-chairs of the core activities, thank all contributors to the bioCADDIE consortium and list them in the Supplementary Note in alphabetical order within each activity group (each name appears only once even though many people participated in different activities).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Lucila Ohno-Machado.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Table 1 and Supplementary Note (DOCX 39 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ohno-Machado, L., Sansone, SA., Alter, G. et al. Finding useful data across multiple biomedical data repositories using DataMed. Nat Genet 49, 816–819 (2017).

Download citation

  • Published:

  • Issue Date:

  • DOI:

Further reading


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing