Large language models (LLMs) are deep learning models with a huge number of parameters trained in an unsupervised way on large volumes of text. LLMs started to emerge around 2018 and since then there has been a sharp increase in the number of parameters and capabilities (for example, GPT-4 has over 100 trillion parameters and can process both text and images). Discussions about the use and misuse of this technology in science erupted in late 2022, prompted by the sudden widespread access to LLM tools that can generate and edit scientific text or can answer scientific questions. Some of the open questions fuelling these conversations are summarized in Box 1.

AK: Until more robust and reliable safeguards are in place, the scientific community should take a timely and firm stance to avoid any overreliance on LLMs and to foster practices of responsible science in the age of LLMs. Otherwise, the risk is to jeopardize the credibility of scientific knowledge. An initial step towards this is to try to design LLM policies in a realistic way; for example, to identify and ban papers that primarily rely on LLMs, a policy already adopted at the International Conference on Machine Learning (ICML) 2023 and likely to be enforced widely. However, identifying LLM-generated text is challenging, and the development of accurate detection tools is an ongoing area of research. Recent studies have raised concerns about the reliability of these methods in accurately distinguishing between LLM-generated and non-LLM-generated text12.

In addition, scientists must also be more vocal about the potential negative impacts of this technology on the scientific community. By raising awareness and demanding further research and development of safeguards, the scientific community can actively contribute to the responsible and ethical use of LLMs. This includes promoting interdisciplinary collaboration and sharing knowledge about the potential risks and benefits of LLMs in various fields.