It is critical we learn from the HCQ experience as we look for immediate medical solutions to this crisis. The COVID-19 pandemic has resulted in an unprecedented number of other therapeutic proposals, from peer-reviewed and preprint publications to blog posts, tweets, television and other communication channels. No matter how rational or sound, this many proposals cannot be systematically evaluated and prioritized by any single group, institution or regulator. There is a clear need to understand the ever-growing stream of data, information and knowledge being published in order to collate, process and structure it in real time.

In this respect, dictionary-based text mining21, coupled with specialized artificial intelligence (AI) or machine learning (ML) originally called statistical pattern recognition, such as BioBERT, a bidirectional biomedical language representation model22, can help achieve that potential. Rigorous, evidence-based peer review coupled with open data and computer-aided technologies offer a way out of the dilemma and provide the opportunity to make reasoned scientific breakthroughs in a crisis (Box 1). Although preprint services allow rapid dissemination of new findings, they are not peer reviewed and can easily mislead non-experts and encourage sensationalist and erroneous public coverage. At present, there are several explainable AI/ML systems designed to predict the outcomes of clinical trials on the basis of multiomics data, chemistry and textual information in prospective validation studies. These systems may be used to caution the regulators and encourage more careful review in cases where predictions disagree with the published results. Comprehensive end-to-end AI-powered drug discovery systems such as pharma.ai (http://insilico.com/platform/) integrating the preclinical and clinical datasets and providing target hypotheses; small molecule screening; and grant, publication, patent and clinical trial data analysis may be used for both repurposing and prediction of clinical trial outcomes.

Clinical drug development requires distinct areas of expertise. Unbiased clinical data, compiled in real time, ought to be made accessible without restrictions, and intellectual property rights should be waived for this and future pandemics. Although coordinated efforts are ongoing, this pandemic offers a unique opportunity to lay the foundation for synchronized global workflows that will ensure data veracity, provide an unbiased and multi-viewpoint assessment of therapeutic alternatives, and allow efficient allocation of computational, human and experimental resources. This must occur in the context of allowing peer review, fact checking and incorporation of relevant domain expertise.

Traditionally, such activities would take place in laboratories and be followed by human clinical trials. Given the almost complete shutdown of animal research facilities and need for focused rational clinical trials, it is worthwhile to explore how many of these activities can be supplemented or even replaced by the capabilities that are now available through in silico technologies, such as ML, systems biology and computer-aided drug repurposing. These technologies have matured in recent years and are ready to become an integral part of the global workflow, to prioritize novel drug targets and new chemical entities as well as to evaluate off-label or drug repositioning proposals22. Such workflows — based, for example on Drug Repositioning Evidence Level (DREL)23,24 — could be used to evaluate drug repositioning candidates. By integrating multiple layers of data, information and knowledge and processing the massive stream of repositioning proposals, validated machine-intelligence-based methods could serve, in the near future, as a decision support system for policymakers, healthcare providers and society at large25. If properly resourced and implemented, such a synchronized workflow could assist in assembling disparate evidence and hypotheses into actionable healthcare solutions to tackle the current and future inevitable pandemics.

In practice, the deployment of AI/ML methods requires a comprehensive understanding of their advantages and weaknesses. AI/ML is powerful for identifying relevant patterns within large set of nonlinear data without the need for manual feature engineering as systems can learn implicit rules from the data provided. While the amount of data needed to train such algorithms might be an issue, the ability of AI/ML to make sense of large amount of data is an advantage in many circumstances. For instance, the Smith–Waterman algorithm and Pfam are standard methods for the prediction of protein functions, but they are not fast enough to handle large number of protein sequences. AI/ML offers alternatives to address both issues. For instance, DeepFam26 is an alignment-free method extracting functional information from sequences without requiring multiple sequence alignments. When compared to state-of-the-art methods, DeepFam performed better in terms of accuracy and runtime for predicting protein functions. In this context, the emergence of AI/ML approaches is incremental and can be built on classical sequence similarity and genome analysis with tools such as BLAST.

ML is already starting to be used to identify biological targets for therapeutic intervention in heterogeneous disease27 and find suitable drug candidates that bind those targets (Table 2). Similarly, if empirical clinical observations in a paper are used to propose a drug as a potential treatment approach, ML could and should be used to rapidly simulate efficacy and side effects in (preferably stratified) populations. A synchronized workflow using ML methods could be based on resources available for analyzing targets, drugs and related potential side effects, such as the Side Effect Resource (SIDER), which combines data on drugs, targets and side effects recorded during clinical trials, and the FDA Adverse Event Reporting System (FAERS), which gives access to adverse event reports and medication error reports previously submitted to FDA.

Table 2 Examples of data science and AI/ML techniques for drug repositioning for COVID-19 Full size table

There are still several unknowns about the biology and mode of action of SARS-CoV-2. However, information about the sequence of the viral genome, discovery of receptors used by the virus to infect cells and knowledge of structure of the virus allow the identification of potential targets for direct-acting antivirals. As data have emerged on population-scale pathology, it has become clear that an overactive host immune response is a clear driver of more serious disease. Naturally, the data gathering, analytics strategies and focus for therapy discovery have responded to these data.

Furthermore, ML algorithms can accelerate the design of clinical trials by automatically identifying suitable subjects, ensuring the correct distribution to groups of study participants and providing an early warning system for a clinical trial that is not producing meaningful results. Computational drug repurposing accelerates the drug development process and reduces the associated costs. To identify the right repurposing candidates, it is important to identify known molecular targets, to predict novel molecular targets for known drugs, and to consider dosing, pharmacokinetic and safety-related parameters. With its ability to analyze millions of examples of drug and patient data to generate hypotheses and then provide evidence supporting or challenging them, ML can be used to identify new indications for known drugs and to combine existing drugs in ways that give them therapeutic powers that each lacks in isolation.

Within healthcare and drug discovery, AI/ML should be implemented as an adjunct to human workflows rather than as an alternative. Integration of AI/ML offers powerful options, but can only be successful within multidisciplinary teams that can ensure AI/ML solutions are adapted to each particular situation. From this viewpoint, human expertise and final decision-making will remain essential in drug discovery and development, as well as in clinical practice. There are currently few examples of large-scale integration of AI/ML technologies in drug discovery or clinical practice. In drug discovery, where the timeframe for a drug to undergo preclinical testing and clinical trials is especially long, more time is needed to assess their real impact.

AI/ML algorithms already deployed within the drug development pipeline have greatly improved. For instance, the synthetic tractability that was a weakness of the first AI/ML de novo design methods can now be evaluated using synthetic accessibility scores. When properly designed in collaboration with medicinal chemist experts, platforms for de novo design can prioritize synthetically tractable molecular structures with the desired biological activity. Moreover, state-of-the-art AI-based methods for de novo design can generate molecular structures using restricted information. Binding-site amino acid environment and cocrystallized fragment, for instance, provide the pocket and ligand features needed to perform either ligand-based or pocket-based generation. Nevertheless, challenges encountered when developing AI/ML solutions for de novo molecular generation or for medical imaging prognosis28 also demonstrate that there is a need to develop and improve reporting standards and metrics, as well as best practices for data sharing and a requirement for algorithm availability, which should be adapted to the strict requirements and expectations of medical sciences and healthcare.