Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Comment
  • Published:

Multimodal large language models for bioimage analysis

Multimodal large language models have been recognized as a historical milestone in the field of artificial intelligence and have demonstrated revolutionary potentials not only in commercial applications, but also for many scientific fields. Here we give a brief overview of multimodal large language models through the lens of bioimage analysis and discuss how we could build these models as a community to facilitate biology research.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Constructing MLLMs for bioimage analysis.
Fig. 2: An example of applying MLLMs to bioimage analysis.

References

  1. Kaplan, J. et al. Preprint at https://doi.org/10.48550/arXiv.2001.08361 (2020).

  2. Ngiam, J. et al. Multimodal deep learning. In ICML’11: Proc 28th International Conf. on Machine Learning (eds Getoor, L. & Scheffer, T.) 689–696 (Omnipress, 2011).

  3. Brown, T. et al. Language models are few-shot learners. In Adv. Neural Inf. Process. Syst. 33 (eds.) (2020).

  4. Kirillov, A. et al. Segment anything. In Proc. IEEE/CVF International Conf. on Computer Vision, 4015–4026 (IEEE, 2023).

  5. Alvelid, J., Damenti, M., Sgattoni, C. & Testa, I. Nat. Methods 19, 1268–1275 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Royer, L. A. Nat. Methods https://doi.org/10.1038/s41592-024-02310-w (2024).

  7. Carpenter, A. E., Cimini, B. A. & Eliceiri, K. W. Nat. Methods 20, 962–964 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Strack, R. Nat. Methods 17, 23 (2020).

    Article  CAS  PubMed  Google Scholar 

  9. Ma, C., Tan, W., He, R. & Yan, B. Nat. Methods https://doi.org/10.1038/s41592-024-02244-3 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  10. Archit, A. et al. (2023). Preprint at bioRxiv https://doi.org/10.1101/2023.08.21.554208 (2023)

  11. Cui, H. et al. Nat. Methods https://doi.org/10.1038/s41592-024-02201-0 (2024).

    Article  PubMed  Google Scholar 

  12. Schaar, A. C. et al. Preprint at bioRxiv https://doi.org/10.1101/2024.04.15.589472 (2024).

  13. Patel, J. M. Getting Structured Data from the Internet: Running Web Crawlers/Scrapers on a Big Data Production Scale (Apress, 2020).

  14. Liu, K. & Prabhakar, V. Preprint at bioRxiv https://doi.org/10.1101/2023.10.31.565037 (2023).

  15. Lewis, P. et al. Retrieval-augmented generation for knowledge-intensive NLP tasks. In Adv. Neural Inf. Process. Syst. 33 (eds) 9459–9474 (2020).

  16. Ding, N. et al. Nat. Mach. Intell. 5, 220–235 (2023).

    Article  Google Scholar 

Download references

Acknowledgements

S.Z. is supported by the National Science and Technology Major Project of China (No. 2022ZD0117801). J.C. is funded by the Federal Ministry of Education and Research (Bundesministerium für Bildung und Forschung, BMBF) in Germany under the funding reference 161L0272, and also supported by the Ministry of Culture and Science of the State of North Rhine-Westphalia (Ministerium für Kultur und Wissenschaft des Landes Nordrhein-Westfalen, MKW NRW).

Author information

Authors and Affiliations

Authors

Contributions

S.Z. proposed the idea and paper framework, and G.D., T.H. & J.C. joined the discussion. All the authors wrote, edited and gave final approval to the manuscript.

Corresponding authors

Correspondence to Shanghang Zhang or Jianxu Chen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, S., Dai, G., Huang, T. et al. Multimodal large language models for bioimage analysis. Nat Methods 21, 1390–1393 (2024). https://doi.org/10.1038/s41592-024-02334-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41592-024-02334-2

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing