From radiological and histological simulations, to deep genomic and proteomic analysis, the field of biomedical research has become so data-rich that it can overwhelm even specialists’ capacity to process the information.
But, if accuracy can be improved, generative pre-trained transformers (GPTs) —of which ChatGPT is a prominent example — could help scientists identify patterns and insights in complex data.
Generative AI allows systems to learn from massive amounts of existing data and create new data based on users’ instructions. Multimodal generative AIs, which can handle many types of data, such as text, images and audio, could be helpful in a range of tasks, from suggesting chemical combinations for new materials and drug discovery, to diagnosing and treating diseases using medical records and pathology images, says Kyunghoon Bae, president of the LG AI Research, at LG Group in Seoul, South Korea. But it’s crucial to have reliable and high-quality data for generative AI training, he says.
In March 2024, LG AI Research announced a partnership with The Jackson Laboratory (JAX), an independent, nonprofit biomedical research organization headquartered in Bar Harbor, Maine, in the United States. JAX, which was originally founded in 1929 to uncover the genetic basis of cancer, has pioneered the use of laboratory mice as models for human disease. Today, it works at the interface of mouse genetics and human genomics, using cell-based tools and computational modeling to explore disease biology.
The LG and JAX teams will work together to create AI models that can rapidly diagnose cancer and help predict treatment outcomes from pathology images, as well as genomic and clinical data. The researchers will also use one of JAX’s mouse datasets that looks at genomic, behavioural and metabolic data across the lifespan of mice to examine mechanisms that may indicate an individual’s Alzheimer’s disease trajectory, so that they can predict responses to therapeutic interventions.
AI hallucination
LG AI Research’s generative AI model, EXAONE, has been designed specifically to address the problems that businesses and industrial users grapple with when using popular generative AIs, says Bae.
“When we evaluated other generative AIs, we quickly identified an issue that’s called ‘hallucination’,” he says. Hallucination is when AI presents false information as if it were fact. This can occur due to algorithmic limitations or training AI with poor-quality data.
LG AI Research has a natural advantage. The company has obtained permission to use high-quality data from several LG affiliated companies in the fields of electronics, chemistry, and information and communication technology. To extend EXAONE’s expertise in chemistry, LG AI Research also trained it with licensed data from more than 45 million research papers and patents.
Additional layers of defence against hallucination built into EXAONE includes an AI framework known as retrieval-augmented generation (RAG).
RAG retrieves factual information from external sources which can help the AI model to provide more accurate responses. Rather than relying purely on inferences made from its training data, the model also cross-references with curated external data sources to help construct answers. The team’s early adoption of RAG has given it a head start, says Bae. “We have seen a lot of companies using RAG in 2024, but we have been using it since early last year,” he explains.
However — as AI hallucination cannot be ruled out entirely — EXAONE’s answers also clearly reference the data source or the model used to generate a prediction. In that way, the user can see the reasoning or the factual information from which the answer is derived, and can make their own judgement.
Molecular structures
Chemistry is often a highly visual science. EXAONE, which was trained on 350 million licensed image-text data pairs, can learn from both text and from images, including depictions of molecular structures, and charts and graphs.
EXAONE’s performance in this area is underpinned by the team’s development of advanced ‘transfer learning’, in which AI models share useful information to better adapt to new, but related predictive tasks. In the chemistry sphere, a technique LG AI Research developed — called a geometrically aligned transfer encoder — helps EXAONE to predict molecular properties using fundamental chemical principles.
“Molecules have many different properties, but they are unified by underlying principles,” explains Daewoong Jeong, a research scientist at LG AI Research. “For example, the boiling point and the surface tension of a liquid are molecular properties used in different contexts, but the common principle is the attraction between the molecules in the liquid.”
In the process of being trained on experimental data, EXAONE can learn these underlying principles to enhance its predictive capabilities — which LG AI Research is now exploring with internal partners at LG Chem, one of its affiliates. The hope is that the AI could soon boost the productivity of experimental chemists by predicting the most promising candidates for materials, such as a new battery, or by suggesting a more efficient way to make a new medicine.
Medical insights
Rather than create a single, general large-scale generative AI, EXAONE is designed to be used as multiple AI models, each specializing in supporting experts in specific research fields. These streamlined AIs are more energy efficient and cheaper to operate.
Biomedicine is one area where EXAONE’s multimodal capabilities are well suited to processing diverse biomedical data, including medical records, genomic datasets, histology slides and spatial transcriptomics, says Rodrigo Hormazabal, another research scientist at LG AI Research.
The JAX and LG AI Research collaboration, for example, will focus on an AI to identify new biomarkers or biomarker combinations that could be used to stratify patients with different subtypes of Alzheimer’s disease.
“The Jackson Laboratory has a very large research dataset on Alzheimer’s, including data from mouse models that carry genetic variants representing human risk alleles and gene mutations associated with the disease,” explains Yongmin Park, who leads the AI business team at LG AI Research.
“We then plan to follow disease progression for each subtype in mouse models,” he explains. The goal is to use this data to explore and identify new therapeutic targets and more effective treatment strategies for people with these subtype conditions.
In addition, the team is developing interactive AI tools designed to support the work of oncologists, automatically keeping them appraised of all the latest research in the field, while supporting treatment decisions based on each patient’s genomic data.
“EXAONE is different from other generative AI models because our goal is to make it possible for real domain experts to use it,” Bae says. “We are now at a level where we can create AI models that can accomplish this.”