Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Correspondence
  • Published:

The perpetual motion machine of AI-generated data and the distraction of ChatGPT as a ‘scientist’

This article has been updated

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Change history

  • 30 April 2024

    In the version of the article initially published, in Box 1, the text now reading “300 billion web pages” originally read “3 billion web pages” and has now been amended in the HTML and PDF versions of the article.


  1. Burley, S. K. et al. Nucleic Acids Res. 51, D488–D508 (2023).

    Article  CAS  PubMed  Google Scholar 

  2. Jumper, J. et al. Nature 596, 583–289 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Terwilliger, T. C. et al. Nat. Methods (2023).

  4. Jahanian, A., Puig, X., Tian, Y. & Isola, P. Generative models as a data source for multiview representation learning. Preprint at arXiv (2022).

  5. Dietterich, T. G. In Multiple Classifier Systems (MCS 2000), Lecture Notes in Computer Science Vol. 1857 (Springer, 2000).

  6. Schuhmann, C. et al. LAION-5B: an open large-scale dataset for training next generation image-text models. Preprint at arXiv (2022).

  7. Deng, J. et al. Fundam. Res. 3, 727–737 (2023).

    Article  CAS  Google Scholar 

  8. Kearnes, S. M. et al. J. Am. Chem. Soc. 143, 18820–18826 (2021).

    Article  CAS  PubMed  Google Scholar 

  9. Tran, R. et al. ACS Catal. 13, 3066–3084 (2022).

    Article  Google Scholar 

  10. Sriram, A. et al. The Open DAC 2023 dataset and challenges for sorbent discovery in direct air capture. Preprint at arXiv (2023).

  11. Jain, A. et al. APL Mater. 1, 11002 (2013).

    Article  Google Scholar 

Download references


Thanks to Tyler Bonnen, James Bowden, Jennifer Doudna, Lisa Dunlap, Alyosha Efros, Nicolo Fusi, Aaron Hertzmann, Hanlun Jiang, Aditi Krishnapriyan, Jitendra Malik, Sara Mostafavi, Hunter Nisonoff and Ben Recht for helpful comments on this piece as it was taking shape.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Jennifer Listgarten.

Ethics declarations

Competing interests

The author declares no competing interests.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Listgarten, J. The perpetual motion machine of AI-generated data and the distraction of ChatGPT as a ‘scientist’. Nat Biotechnol 42, 371–373 (2024).

Download citation

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing