Traditionally, companies test a drug candidate against dozens of disease types hoping that one will show benefits for patients. “That takes a huge investment and a lot of time,” says Samer El Bawab, global head of quantitative pharmacology at Servier, a pharmaceutical company based in France.
This type of approach doesn’t have a great clinical success rate. “Many immune-oncology compounds have failed,” El Bawab continues. “That’s part due to not fully understanding the biology being treated.”
Because Servier, and other biopharma companies, are under pressure to maximize chances of clinical success, while keeping development costs down, drug positioning is becoming increasingly important. It involves informed decisions around the specific form of disease that a novel therapy is most likely to treat effectively, then selecting the patients most likely to benefit — before trials begin.
Optimal drug positioning requires a combination of meticulous mapping of biomedical information, interrogation of the underlying relationships, and analysis of extensive patient data — the kind of work at which AI excels, and the remit of an AI drug positioning engine called DrugMatch, from AI biotech, Owkin.
Prioritizing the possibilities
DrugMatch starts with knowledge graphs (KGs) to prioritize the most promising indication for a novel therapy from a shortlist of possibilities.
“A knowledge graph is a complex data structure that captures relationships between entities,” says Anna Gogleva, lead research scientist and graph expert at Owkin. Imagine a spider’s web: in knowledge-graph terminology, each point where the web connects is called a node, and the strands of the web are edges.
“In a biomedical context, the nodes can be a disease, a gene, a protein, a drug target,” says Gogleva.
The edges represent the relationships between the nodes — connecting, say, a disease to a gene that plays a role in that illness, or to a drug that treats it. Edges can carry additional information, such as the type of relationship. “The importance of certain edges to various downstream tasks (such as prediction of a new link, or classification of a node) can be learned during model training,” Gogleva says.
In essence, a KG can mimic the organization of biomedical systems. “Many things in biology operate in network-like structures,” Gogleva explains, “whether that’s protein–protein interactions, metabolic networks or regulatory networks.”
To build a knowledge graph, says Jonas Béal, director of drug positioning R&D at Owkin, the company uses publicly available data from the scientific literature and biomedical databases. “With graph methods, we stack together different layers of data from different entities — genes, proteins, diseases,” he explains. “Each layer of data adds perspective.”
Using graph machine learning methods developed by Owkin, a KG can be analysed for drugs or targets that have effectively treated indications in previous clinical trials. Based on that information, Owkin applies AI to predict which indications a novel therapy might treat.
Picking the best population of patients
Once a drug–disease pair has been selected, Owkin determines the specific group of patients who are most likely to benefit. To select the best subpopulation, scientists at Owkin delve into real-world multimodal patient data, not available through the public databases, which include current treatments and the most up-to-date biology.
To collect the data, Owkin works with a network of partners, such as top cancer centres Gustave Roussy, in France, The Royal Marsden, in the United Kingdom, and Memorial Sloan Kettering, in the United States. If the required data are not available, Owkin works with its partners to generate the missing modalities and enrich the dataset.
“Patient data are critical,” Béal says, “because that’s where the fine-grained differences become especially important. When we are defining smaller subpopulations and their associated biomarkers, we must ensure their relevance and applicability in real-world settings.”
These cohorts number only a few hundred patients but are extremely rich. Data include clinical, ‘omics’, spatial information (such as from Owkin’s MOSAIC initiative), medical images and more. All these data need to be summarized for each patient. “The first step is to have the AI represent all 20,000 genes, involved in hundreds of biological functions, and thousands of pixels of tumour biopsy images, as condensed variables,” says Béal.
This process is a machine-learning approach called representation learning to describe the patient population. Owkin’s experts have developed algorithms that analyse pathology slides and extract meaningful information, which can be melded with molecular descriptions1,2.
Combining all these patient data and analyses creates an aggregated view that allows the AI to predict the ideal patient subgroups that should be tested in a clinical trial of a novel drug candidate.
Although AI can often be something of a black box, Owkin’s approach can also reveal some of the logic behind the selection of those patients. For instance, the AI-based model might reveal “patterns that increase the odds for a given drug and disease to be a successful or an unsuccessful pair”, Béal says. Consequently, he adds, “A drug developer can select a clinical trial’s patients based on the analysis of an extensive dataset, rather than guessing.”
An immune-oncology exploration
El Bawab and his colleagues at Servier are working with Owkin to identify a patient population that may better respond to their developmental treatment. “We hope this will increase the probability of success of our clinical trial on an immune-oncology drug,” he says.
“Owkin has really good experts, and we trust that working with them can increase our chances of success,” El Bawab adds.
For Servier, Owkin’s tools also provide additional, greater, value: they can reveal more about the molecular mechanisms of cancer. “Now, with the emerging amount of data generation from individual patients, and AI and machine-learning approaches, you can better understand the disease,” El Bawab says. “Instead of just testing without knowing if a patient is likely to respond or not, we can state clear hypotheses that we can test in clinics.”