Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • News & Views
  • Published:

Drug discovery

A multidimensional dataset for structure-based machine learning

MISATO, a dataset for structure-based drug discovery combines quantum mechanics property data and molecular dynamics simulations on ~20,000 protein–ligand structures, substantially extends the amount of data available to the community and holds potential for advancing work in drug discovery.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Non-exhaustive comparison of MISATO to existing datasets, and pipeline for MISATO construction.


  1. Buttenschoen, M., Morris, G. M. & Deane, C. M. Chem. Sci. 15, 3130–3139 (2024).

    Article  Google Scholar 

  2. Wang, R., Fang, X., Lu, Y. & Wang, S. J. Med. Chem. 47, 2977–2980 (2004).

    Article  Google Scholar 

  3. Donchev, A. G. et al. Sci. Data 8, 55 (2021).

    Article  Google Scholar 

  4. Korlepara, D. B. et al. Sci. Data 9, 548 (2022).

    Article  Google Scholar 

  5. Mobley, D. L. & Guthrie, J. P. J. Comput. Aided Mol. Des. 28, 711–720 (2014).

    Article  Google Scholar 

  6. Aggarwal, R., Gupta, A. & Priyakumar, U. D. Preprint at (2021).

  7. Siebenmorgen, T. et al. Nat. Comput. Sci. (2024).

    Article  Google Scholar 

  8. Eberhardt, J., Santos-Martins, D., Tillack, A. F. & Forli, S. J. Chem. Inf. Mod. 61, 3891–3898 (2021).

    Article  Google Scholar 

  9. PDB Statistics: Overall Growth of Released Structures Per Year (PDB, 2024);

Download references

Author information

Authors and Affiliations


Corresponding authors

Correspondence to Matthew Holcomb or Stefano Forli.

Ethics declarations

Competing interests

The authors declare no competing interests.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Holcomb, M., Forli, S. A multidimensional dataset for structure-based machine learning. Nat Comput Sci 4, 318–319 (2024).

Download citation

  • Published:

  • Issue Date:

  • DOI:


Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research