Finding useful data across multiple biomedical data repositories using DataMed

Abstract

The value of broadening searches for data across multiple repositories has been identified by the biomedical research community. As part of the US National Institutes of Health (NIH) Big Data to Knowledge initiative, we work with an international community of researchers, service providers and knowledge experts to develop and test a data index and search engine, which are based on metadata extracted from various data sets in a range of repositories. DataMed is designed to be, for data, what PubMed has been for the scientific literature. DataMed supports the findability and accessibility of data sets. These characteristics—along with interoperability and reusability—compose the four FAIR principles to facilitate knowledge discovery in today's big data–intensive science landscape.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Data sources have various metadata specifications, which undergo ingestion into the common DATS model, whose metadata elements are used for indexing and DataMed searches.
Figure 2: Community input to the Data Discovery Index Consortium.

References

  1. 1

    Wilkinson, M.D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016).

    Article  Google Scholar 

  2. 2

    Collins, F.S. & Tabak, L.A. Policy: NIH plans to enhance reproducibility. Nature 505, 612–613 (2014).

    Article  Google Scholar 

  3. 3

    Bourne, P.E. et al. The NIH Big Data to Knowledge (BD2K) initiative. J. Am. Med. Inform. Assoc. 22, 1114 (2015).

    Article  Google Scholar 

  4. 4

    Lu, Z. PubMed and beyond: a survey of web tools for searching biomedical literature. Database (Oxford) 2011, baq036 (2011).

    Article  Google Scholar 

  5. 5

    Sansone, S.-A. et al. DATS: the data tag suite to enable discoverability of datasets. Sci. Data 4, 170059 (2017).

    Article  Google Scholar 

  6. 6

    Noruzi, A. Google Scholar: the new generation of citation indexes. Libri 55, 170–180 (2005).

    Article  Google Scholar 

  7. 7

    Hands, A. Microsoft Academic Search—http://academic.research.microsoft.com. Tech. Serv. Q. 29, 251–252 (2012).

    Article  Google Scholar 

  8. 8

    Kejariwal, D. & Mahawar, K.K. Is your journal indexed in PubMed? Relevance of PubMed in biomedical scientific literature today. WebmedCentral MISCELLANEOUS 3, WMC003159 (2012).

    Google Scholar 

  9. 9

    Huh, S. Journal Article Tag Suite 1.0: National Information Standards Organization standard of journal extensible markup language. Sci. Ed. 1, 99–104 (2014).

    Article  Google Scholar 

  10. 10

    Perez-Riverol, Y. et al. Nat. Biotechnol. 35, 406–409 (2017)

    CAS  Article  Google Scholar 

  11. 11

    Brase, J. in 2009 Fourth International Conference on Cooperation and Promotion of Information Resources in Science and Technology (COINFO 2009) 257–261 (IEEE, 2009).

    Google Scholar 

  12. 12

    Chodorow, K. MongoDB: The Definitive Guide (O'Reilly Media, 2013).

    Google Scholar 

  13. 13

    Kuć, R. & Rogozinski, M. ElasticSearch Server (Packt Publishing, 2016).

    Google Scholar 

  14. 14

    Coll, I.S. & Cruz, J.M.B. Open archives initiative. Protocol for metadata harvesting (OAI-PMH): descripción, funciones y aplicaciones de un protocolo. Prof. Inf. 12, 99–106 (2003).

    Google Scholar 

  15. 15

    Richardson, L. & Ruby, S. RESTful Web Services (O'Reilly Media, 2008).

    Google Scholar 

  16. 16

    Westbrook, J., Ito, N., Nakamura, H., Henrick, K. & Berman, H.M. PDBML: the representation of archival macromolecular structure data in XML. Bioinformatics 21, 988–992 (2005).

    CAS  Article  Google Scholar 

  17. 17

    Kiryakov, A., Popov, B., Terziev, I. Manov, D. & Ognyanoff, D. Semantic annotation, indexing, and retrieval. Web Semantics 2, 49–79 (2004).

    Article  Google Scholar 

  18. 18

    Haustein, S., Peters, I., Sugimoto, C.R., Thelwall, M. & Larivière, V. Tweeting biomedicine: an analysis of tweets and citations in the biomedical literature. J. Assoc. Inf. Sci. Technol. 65, 656–669 (2014).

    Article  Google Scholar 

Download references

Acknowledgements

This project is funded by grant U24AI117966 from NIAID, NIH, as part of the BD2K program. The co-authors, who are the lead investigators and chairs/co-chairs of the core activities, thank all contributors to the bioCADDIE consortium and list them in the Supplementary Note in alphabetical order within each activity group (each name appears only once even though many people participated in different activities).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Lucila Ohno-Machado.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Table 1 and Supplementary Note (DOCX 39 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ohno-Machado, L., Sansone, SA., Alter, G. et al. Finding useful data across multiple biomedical data repositories using DataMed. Nat Genet 49, 816–819 (2017). https://doi.org/10.1038/ng.3864

Download citation

Further reading

Search

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing