In 2022 we saw the launch of breakthrough artificial intelligence (AI) tools such as DALL-E 2 and ChatGPT, both from OpenAI. For the average user, these tools seem fun and exciting to play with. DALL-E 2 is able to generate images based on the input of text descriptions, whereas ChatGPT is a natural language model that can generate text and answer questions. Who wouldn’t want to generate an image of your favorite cartoon character or pet in the style of your favorite artist, or ask ChatGPT to tell you a new joke or write you a song? But for scientists and biotech companies, AI technologies have been on a steady rise over the last few years, and these new generative AI tools hold much potential as more powerful and mature.

Generative AI is the term given to any technology that can generate images, text or media in response to a short prompt. Some of these models are less biased than other machine learning models because they don’t require manual input of training data, but they are high-quality because they learn from input of multiple, large data sources. While OpenAI has gotten much of the recent press, there have been several other biotech startups in the last few years adding to the growth of generative AI, including Jasper and Stability AI, which has its own text-to-image generation tool called Stable Diffusion, also released in 2022. The technology is being added to countless new apps, from image and music generation to the development of new machine learning algorithms, including in the biotech space.

As an example, traditional methods for protein engineering involve performing either iterative mutagenesis and selection of protein sequences or de novo rational design that create proteins with desired properties, both structural and sequence-specific. Generative AI has the potential to change this protocol, making it easier to generate artificial protein sequences from scratch. Madani and colleagues developed ProGen, which is a language model that can generate protein sequences with a predictable function after having been trained on 280 million sequences from known protein families.

Biotech companies are making similar strides, as the field of small molecule drug discovery will benefit from these generative AI models. Few of the drug candidates that make it to the FDA approval process are ultimately approved for clinical use, but AI models have the potential to find more biologically relevant compounds. Generate Biomedicines is a company launched in 2020 that uses generative AI to create proteins that could be used as novel therapeutics, better tailored to specific conditions and more easily generated. Similarly, Standigm, launched in 2015, has already used similar generative AI tools to create hundreds of novel molecules, in as little as 2 months, through the querying of large biomedical databases. There are many others.

For researchers and students, an ‘open research laboratory’ that looks to intersect AI and biology called OpenBioML has been developed. OpenBioML is backed by Stability AI with the aim of democratizing the technology by providing large-scale computational resources to collaborative research projects. Two of its first projects include BioLM, applying natural language processing to computational biology and chemistry, and DNA-Diffusion, developing AI that can generate DNA sequences from text prompts.

While there have been cases of machine learning in medicine, more accurately diagnosing diseases, these algorithms are not immune to biases, leading to potentially worse treatment for certain patients. Generative AI requires large, accurate datasets to generate high-quality predictions, and biases can occur when these data are incomplete or contain errors. Such systems also may be prone to overfitting. These technologies are expensive, and they require specialized hardware and software to implement. They take significant time to train and use, and they can leave a large carbon footprint — although a generative AI model from the startup Evozyne is also hoping to tackle climate change directly.

Overall in 2022, over $1.37 billion was invested into generative AI companies, and as this software gains more traction in the biomedical space, this amount is likely to increase. There have been predictions that generative AI could result in $1 trillion in value for the healthcare industry by 2040. We still need improvements in the training data to avoid bias, and we need to make them more user- and planet-friendly, but the technology has the potential to influence cancer detection, predict disease variants and mitigate climate change. Certainly there are other applications that we haven’t even considered — maybe we should ask ChatGPT what it thinks.