Main

Various concepts of ‘artificial intelligence’ (AI) have been successfully adopted for computer-assisted drug discovery in the past few years1,2,3. This advance is mostly owed to the ability of deep learning algorithms, that is, artificial neural networks with multiple processing layers, to model complex nonlinear input–output relationships, and perform pattern recognition and feature extraction from low-level data representations. Certain deep learning models have been shown to match or even exceed the performance of the familiar existing machine learning and quantitative structure–activity relationship (QSAR) methods for drug discovery4,5,6. Moreover, deep learning has boosted the potential and broadened the applicability of computer-assisted discovery, for example, in molecular design7,8, chemical synthesis planning9,10, protein structure prediction11 and macromolecular target identification12,13.

The ability to capture intricate nonlinear relationships between input data (for example, chemical structure representations) and the associated output (for example, assay readout) often comes at the price of limited comprehensibility of the resulting model. While there have been efforts to explain QSARs in terms of algorithmic insights and molecular descriptor analysis14,15,16,17,18,19, deep neural network models notoriously elude immediate accessibility by the human mind20. In medicinal chemistry in particular, the availability of ‘rules of thumb’ correlating biological effects with physicochemical properties underscores the willingness, in certain situations, to sacrifice accuracy in favour of models that better fit human intuition21,22,23. Thus, blurring the lines between the ‘two QSARs’24 (that is, mechanistically interpretable versus highly accurate models) may be key to accelerated drug discovery with AI25.

Automated analysis of medical and chemical knowledge to extract and represent features in a human-intelligible format dates back to the 1990s26,27, but has been receiving increasing attention due to the re-emergence of neural networks in chemistry and healthcare. Given the current pace of AI in drug discovery and related fields, there will be an increased demand for methods that help us understand and interpret the underlying models. In an effort to mitigate the lack of interpretability of certain machine learning models, and to augment human reasoning and decision-making,28, attention has been drawn to explainable AI (XAI) approaches29,30.

Providing informative explanations alongside the mathematical models aims to (1) render the underlying decision-making process transparent (‘understandable’)31, (2) avoid correct predictions for the wrong reasons (the so-called clever Hans effect)32, (3) avert unfair biases or unethical discrimination33 and (4) bridge the gap between the machine learning community and other scientific disciplines. In addition, effective XAI can help scientists navigate ‘cognitive valleys’28, allowing them to hone their knowledge and beliefs on the investigated process34.

While the exact definition of XAI is still under debate35, in the authors’ opinion, several aspects of XAI are certainly desirable in drug design applications29:

  • Transparency—knowing how the system reached a particular answer.

  • Justification—elucidating why the answer provided by the model is acceptable.

  • Informativeness—providing new information to human decision-makers.

  • Uncertainty estimation—quantifying how reliable a prediction is.

In general, XAI-generated explanations can be categorized as global (that is, summarizing the relevance of input features in the model) or local (that is, based on individual predictions)36. Moreover, XAI can be dependent on the underlying model, or agnostic, which in turn affects the potential applicability of each method. In this framework, there is no one-fits-all XAI approach.

There are many domain-specific challenges for future AI-assisted drug discovery, such as the data representation fed to said approaches. In contrast to many other areas in which deep learning has been shown to excel, such as natural language processing and image recognition, there is no naturally applicable, complete, ‘raw’ molecular representation. After all, molecules—as scientists conceive them—are models themselves. Such an ‘inductive’ approach, which builds higher-order (for example, deep learning) models from lower-order ones (for example, molecular representations or descriptors based on observational statements) is therefore philosophically debatable37. The choice of the molecular ‘representation model’ becomes a limiting factor of the explainability and performance of the resulting AI model—as it determines of the content, type and interpretability of the chemical information retained (for example, pharmacophores, physicochemical properties, functional groups).

Drug design is not straightforward. It distinguishes itself from clear-cut engineering by the presence of error, nonlinearity and seemingly random events38. We have to concede our incomplete understanding of molecular pathology and our inability to formulate infallible mathematical models of drug action and corresponding explanations. In this context, XAI bears the potential to augment human intuition and skills for designing novel bioactive compounds with desired properties.

Designing new drugs epitomizes in the question whether pharmacological activity (‘function’) can be deduced from the molecular structure, and which elements of such structure are relevant. Multi-objective design poses additional challenges and sometimes ill-posed problems, resulting in molecular structures that too often represent compromise solutions. The practical approach aims to limit the number of syntheses and assays needed to find and optimize new hit and lead compounds, especially when elaborate and expensive tests are performed. XAI-assisted drug design is expected to help overcome some of these issues, by allowing to take informed action while simultaneously considering medicinal chemistry knowledge, model logic and awareness on the system’s limitations39. XAI will foster the collaboration between medicinal chemists, chemoinformaticians and data scientists40,41. In fact, XAI already enables the mechanistic interpretation of drug action42,43, and contributes to drug safety enhancement, as well as organic synthesis planning9,44. If successful in the long run, XAI will provide fundamental support in the analysis and interpretation of increasingly more complex chemical data, as well as in the formulation of new pharmacological hypotheses, while avoiding human bias45,46. Pressing drug discovery challenges such as the coronavirus pandemic might boost the development of application-tailored XAI approaches, to promptly respond to specific scientific questions related to human biology and pathophysiology.

The field of XAI is still in its infancy but moving forward at a fast pace, and we expect an increase of its relevance in the years to come. In this Review, we aim to provide a comprehensive overview of recent XAI research, highlighting its benefits, limitations and future opportunities for drug discovery. In what follows, after providing an introduction to the most relevant XAI methods structured into conceptual categories, the existing and some of the potential applications to drug discovery are presented. Finally, we discuss the limitations of contemporary XAI and point to the potential methodological improvements needed to foster practical applicability of these techniques to pharmaceutical research.

A glossary of selected terms is provided in Box 1.

State of the art and future directions

This section aims to provide a concise overview of modern XAI approaches, and exemplify their use in computer vision, natural-language processing and discrete mathematics. We then highlight selected case studies in drug discovery and propose potential future areas and research directions of XAI in drug discovery. A summary of the methodologies and their goals, along with reported applications is provided in Table 1. In what follows, without loss of generality, f will denote a model (in most cases a neural network); \(x \in {\cal{X}}\) will be used to denote the set of features describing a given instance, which are used by f to make a prediction \(y \in {\cal{Y}}\).

Table 1 Computational approaches towards explainable AI in drug discovery and related disciplines, categorized according to the respective methodological concept

Feature attribution methods

Given a regression or classification model \(f:{\boldsymbol{x}} \in {\Bbb R}^K \to {\Bbb R}\) (where \({\Bbb R}\) refers to the set of real numbers, and K (as a superscript of \({\Bbb R}\)) refers to a k-dimensional set of real numbers), a feature attribution method is a function \({\cal{E}}:{\boldsymbol{x}} \in {\Bbb R}^K \to {\Bbb R}^K\) that takes the model input and produces an output whose values denote the relevance of every input feature for the final prediction computed with f. Feature attribution methods can be grouped into the following three categories (Fig. 1).

  • Gradient-based feature attribution. These approaches measure how much a change around a local neighbourhood of the input x corresponds to a change in the output f(x). A common approach among deep-learning practitioners relies on the use of the derivative of the output of the neural network with respect to the input (that is, δfx) to determine feature importance47,48. Its popularity arises partially from the fact that this computation can be performed via back-propagation49, the main way of computing partial first-order derivatives in neural network models. While the use of gradient-based feature attribution may seem straightforward, several methods relying on this principle have been shown to lead to only partial reconstruction of the original features50, which is prone to misinterpretation.

  • Surrogate-model feature attribution. Given a model f, these methods aim to develop a surrogate explanatory model g, which is constructed in such a way that: (1) g is interpretable and (2) g approximates the original function f. A prominent example of this concept is the family of additive feature attribution methods, where the approximation is achieved through a linear combination of binary variables zi:

$$g\left( {z^\prime _i} \right) = \phi _0 + \mathop {\sum}\nolimits_{i = 1}^M {\phi _iz_i},$$
(1)

where \(z_i \in \left\{ {0,1} \right\}^M\), M is the number of original input features, \(\phi _i \in {\Bbb R}\) are coefficients representing the importance assigned to each ith binary variable and ϕ0 is an intercept. Several notable feature attribution methods belong to this family51,52, such as local interpretable model-agnostic explanations (LIME)53, Deep Learning Important FeaTures (DeepLIFT)54, Shapley additive explanations (SHAP)52 and layer-wise relevance propagation55. Both gradient-based methods and the additive subfamily of surrogate attribution methods provide local explanations (that is, each prediction needs to be examined individually), but they do not offer a general understanding of the underlying model f. Global surrogate explanation models aim to fill this gap by generically describing f via a decision tree or decision set56 model. If such an approximation is precise enough, these aim to to mirror the computation logic of the original model. While early attempts limited f to the family of tree-based ensemble methods (for example, random forests57), more recent approaches are readily applicable to arbitrary deep learning models58.

  • Perturbation-based methods modify or remove parts of the input aiming to measure its corresponding change in the model output; this information is then used to assess the feature importance. Alongside the well-established step-wise approaches59,60, methods such as feature masking61, perturbation analysis62, response randomization63 and conditional multivariate models64 belong to this category. While perturbation-based methods have the advantage of directly estimating feature importance, they are computationally slow when the number of input features increases64, and the final result tends to be strongly influenced by the number of features that are perturbed altogether65.

Fig. 1: Feature attribution methods.
figure 1

Given a neural network model f, which computes the prediction y = f(x) for input sample x, a feature attribution method \({\cal{E}}\) outputs the relevance of every input feature of x for the prediction. There are three basic approaches to determine feature relevance: (1) gradient-based methods, computing the gradient of the network f with respect to the input x, (2) surrogate methods, which approximate f with a human-interpretable model g, and (3) perturbation-based methods, which modify the original input to measure the respective changes in the output.

Feature attribution methods have been the most used XAI family of techniques for ligand- and structure-based drug discovery in the past few years. For instance, McCloskey et al.66 employed gradient-based attribution47 to detect ligand pharmacophores relevant for binding. The study showed that, despite good performance of the models on held-out data, these still can learn spurious correlations66. Pope et al.67 adapted gradient-based feature attribution68,69 for the identification of relevant functional groups in adverse effect prediction70. Recently, SHAP52 was used to interpret relevant features for compound potency and multitarget activity prediction71. Hochuli et al.72 compared several feature attribution methodologies, showing how the visualization of attributions assists in the parsing and interpretation of protein-ligand scoring with three-dimensional convolutional neural networks73,74.

It should be noted that the interpretability of feature attribution methods is limited by the original set of features (model input). Particularly in drug discovery, the interpretability is often hampered by the use of complex or ‘opaque’ input molecular descriptors75. When making use of feature attribution approaches, it is advisable to choose comprehensible molecular descriptors or representations for model construction (Box 2). Recently, architectures borrowed from the natural language processing field, such as long short-term memory networks76 and transformers77, have been used as feature attribution techniques to identify portions of simplified molecular input line entry systems (SMILES)78 strings that are relevant for bioactivity or physicochemical properties79,80. These approaches constitute a first attempt to bridge the gap between the deep learning and medicinal chemistry communities, by relying on representations (atom and bond types, and molecular connectivity78) that bear direct chemical meaning and need no posterior descriptor-to-molecule decoding.

Instance-based approaches

Instance-based approaches compute a subset of relevant features (instances) that must be present to retain (or absent to change) the prediction of a given model (Fig. 2). An instance can be real (that is, drawn from the set of data) or generated for the purposes of the method. Instance-based approaches have been argued to provide ‘natural’ model interpretations for humans, because they resemble counterfactual reasoning (that is, producing alternative sets of action to achieve a similar or different result)81.

  • Anchor algorithms82 offer model-agnostic interpretable explanations of classifier models. They compute a subset of if-then rules based on one or more features that represent conditions to sufficiently guarantee a certain class prediction. In contrast to many other local explanation methods53, anchors therefore explicitly model the ‘coverage’ of an explanation. Formally, an anchor A is defined as a set of rules such that, given a set of features x from a sample, they return A(x) = 1 if said rules are met, while guaranteeing the desired predicted class from f with a certain probability τ:

$${\Bbb E}_{{\cal{D}}\left( {z|A} \right)}\left[ {1_{f\left( x \right) = f\left( z \right)}} \right] \ge \tau ,$$
(2)

where \({\cal{D}}\left( {z|A} \right)\) is defined as the conditional distribution on samples where anchor A applies. This methodology has successfully been applied in several tasks such as image recognition, text classification and visual question answering82.

  • Counterfactual instance search. Given a classifier model f and an original data point x, counterfactual instance search83 aims to find examples x′ (1) that are as close to x as possible and (2) for which the classifier produces a different class label from the label assigned to x. In other words, a counterfactual describes small feature changes in sample x such that it is classified differently by f. The search for the set of instances x′ may be cast into an optimization problem:

    $$\mathop {{{\mathrm{min}}}}\limits_{x^\prime } \mathop {{{\mathrm{max}}}}\limits_\lambda \left( {f_t - p_t} \right)^2 + \lambda L_1\left( {x^\prime ,x} \right),$$
    (3)

    where ft is the prediction of the model for the tth class, pt is a user-defined target probability for the same class, L1 is the Manhattan distance between the proposed x′ and the original sample x, and λ is an optimizable parameter that controls the contribution of each term in the loss. The first term in this loss encourages the search towards points that change the prediction of the model, while the second ensures that both x and x′ lie close to each other in their input manifold. While in the original paper this approach was shown to successfully obtain counterfactuals in several datasets, the results revealed a tendency to look artificial. A recent methodology84 mitigates this problem by adding extra terms to the loss function with an autoencoder architecture, to better capture the original data distribution. Importantly, counterfactual instances can be evaluated using trust scores (cf. the section on uncertainty estimation). One can interpret a high trust score as the counterfactual being far from the initially predicted class of x compared with the class assigned to the counterfactual x′.

  • Contrastive explanation methods85 provide instance-based interpretability of classifiers by generating ‘pertinent positive’ and ‘pertinent negative’ sets. This methodology is related to both anchors and counterfactual search approaches. Pertinent positives are defined as the smallest set of features that should be present in an instance for the model to predict a ‘positive’ result (similar to anchors). Conversely, pertinent negatives constitute the smallest set of features that should be absent for the model to be able to sufficiently differentiate from the other classes (similar to a counterfactual instance). This method generates explanations of the form ‘An input x is classified as class y because a subset of features{x1, ... xk} is present, and because a subset of features {xm, ... xp} is absent’81 (where k, m and p are arbitrary integer subscripts for x such that kmp). Contrastive explanation methods find such sets by solving two separate optimization problems, namely by (1) perturbing the original instance until it is predicted differently than its original class and (2) searching for critical features in the original input (that is, those features that guarantee a prediction with a high degree of certainty). The proposed approach uses an elastic net regularizer86, and optionally a conditional autoencoder model87 so that the found explanations are more likely to lie closer to the original data manifold.

Fig. 2: Instance-based model interpretation.
figure 2

Given a model f, input instance x and the respective predicted class c, so-called anchor algorithms identify a minimal subset of features of x that are sufficient to preserve the predicted class assignment c. Counterfactual search generates a new instance x′ that lies close in feature space to x but is classified differently by the model, as belonging to class c′.

In drug discovery, instance-based approaches can be valuable to enhance model transparency, by highlighting what molecular features need to be either present or absent to guarantee or change the model prediction. In addition, counterfactual reasoning further promotes informativeness, by exposing potential new information about both the model and the underlying training data for human decision-makers (for example, organic and medicinal chemists).

To the best of our knowledge, instance-based approaches have yet to be applied to drug discovery. In the authors’ opinion, they bear promise in several areas of de novo molecular design, such as (1) activity cliff prediction, as they can help identify small structural variations in molecules that cause large bioactivity changes, (2) fragment-based virtual screening, by highlighting a minimal subset of atoms responsible for a given observed activity, and (3) hit-to-lead optimization, by helping identify the minimal set of structural changes required to improve one or more biological or physicochemical properties.

Graph-convolution-based methods

Molecular graphs are a natural mathematical representation of molecular topology, with nodes and edges representing atoms and chemical bonds, respectively (Fig. 3a)75. Their usage has been commonplace in chemoinformatics and mathematical chemistry since the late 1970s88,89. Thus, it does not come as a surprise in these fields to witness the increasing application of novel graph convolution neural networks90, which formally fall under the umbrella of neural message-passing algorithms91. Generally speaking, convolution refers to a mathematical operation on two functions that produces a third function expressing how the shape of one is modified by the other. This concept is widely used in convolutional neural networks for image analysis. Graph convolutions naturally extend the convolution operation typically used in computer vision92 or in natural language processing93 applications to arbitrarily sized graphs. In the context of drug discovery, graph convolutions have been applied to molecular property prediction94,95 and in generative models for de novo drug design96.

Fig. 3: Graph-based model interpretation.
figure 3

a, Kekulé structure of adrenaline and its respective molecular graph; atoms and bonds constitute nodes and edges, respectively. b, Given an input graph, approaches such as GNNExplainer98 aim to identify a connected, compact subgraph, as well as node-level features that are relevant for a particular prediction y of a graph-neural network model f. c, Attention mechanisms can be used in conjunction with message-passing algorithms to learn coefficients \(\alpha _{ij}^{\left( l \right)}\) for the lth layer, which assign ‘importance’ to the set of neighbours \({\cal{N}}\left( i \right)\) (for example, adjacent atoms) of a node i. These coefficients are an explicit component in the computation of new hidden-node representations \(h_i^{\left( {l + 1} \right)}\) (Eq. (5)) in attention-based graph convolutional architectures. Such learned attention coefficients can be then used to highlight the predictive relevance of certain edges and nodes.

Exploring the interpretability of models trained with graph convolution architectures is currently a particularly active research topic97. For the purpose of this review, XAI methods based on graph convolution are grouped into the following two categories.

  • Subgraph identification approaches aim to identify one or more parts of a graph that are responsible for a given prediction (Fig. 3b). GNNExplainer98 is a model-agnostic example of this category, and provides explanations for any graph-based machine learning task. Given an individual input graph, GNNExplainer identifies a connected subgraph structure, as well a set of node-level features that are relevant for a particular prediction. The method can also provide such explanations for a group of data points belonging to the same class. GNNExplainer is formulated as an optimization problem, where a mutual information objective between the prediction of a graph neural network and the distribution of feasible subgraphs is maximized. Mathematically, given a node v, the goal is to identify a subgraph \(G_{\mathrm{S}} \subseteq G\) with associated features \(X_{\mathrm{S}} = \left\{ {x_j|v_j \in G_{\mathrm{S}}} \right\}\) that are relevant in explaining a target prediction \(\hat y \in Y\) via a mutual information measure MI:

    $$\begin{array}{l}\mathop {{{\mathrm{max}}}}\limits_{G_{\mathrm{S}}} {\mathrm{MI}}\left( {Y,\left( {G_{\mathrm{S}},X_{\mathrm{S}}} \right)} \right) = \\{{H}\left( Y \right)} - {H}\left( {Y|G = G_{\mathrm{S}},X = X_{\mathrm{S}}} \right), \end{array}$$
    (4)

    where H is an entropy term. In practice, however, this objective is not mathematically tractable, and several continuity and convexity assumptions have to be made.

  • Attention-based approaches. The interpretation of graph-convolutional neural networks can benefit from attention mechanisms99, which borrow from the natural language processing field, where their usage has become standard. The idea is to stack several message-passing layers to obtain hidden node-level representations, by first computing attention coefficients associated with each of the edges connected to the neighbours of a particular node in the graph (Fig. 3c). Mathematically, for a given node, an attention-based graph convolution operation obtains its updated hidden representation via a normalized sum of the node-level hidden features of the topological neighbours:

$$h_i^{\left( {l + 1} \right)} = \sigma \left( {\mathop {\sum}\nolimits_{j \in {\cal{N}}\left( i \right)} {\alpha _{ij}^lW^{\left( l \right)}h_j^{\left( l \right)}} } \right),$$
(5)

where \({\cal{N}}\left( i \right)\) is the set of topological neighbours of node i with a one-edge distance, \(\alpha _{ij}^l\) are learned attention coefficients over those neighbours, σ is a nonlinear activation function and W(l) is a learnable feature matrix for layer l. The main difference between this approach and a standard graph convolution update is that, in the latter, attention coefficients are replaced by a fixed normalization constant \(c_{ij} = \sqrt {\left| {{\cal{N}}\left( i \right)} \right|} \sqrt {\left| {{\cal{N}}\left( j \right)} \right|}\).

Methods based on graph convolution represent a powerful tool in drug discovery due to their immediate and natural connection with representations that are intuitive to chemists (that is, molecular graphs and subgraphs). In addition, the possibility to highlight atoms that are relevant towards a particular prediction, when combined with mechanistic knowledge, can improve both a model justification (that is, to elucidate if a provided answer is acceptable) and its informativeness on the underlying biological and chemical processes.

In particular, GNNExplainer was tested on a set of molecules labelled for their mutagenic effect on Salmonella typhimurium100, and identified several known mutagenic functional groups (that is, certain aromatic and heteroaromatic rings and amino/nitro groups100) as relevant. A recent study41 describes how the interpretation of filters in message-passing networks can lead to the identification of relevant pharmacophore- and toxicophore-like substructures, showing consistent findings with literature reports. Gradient-based feature attribution techniques, such as integrated gradients47, were used in conjunction with graph convolutional networks to analyse retrosynthetic reaction predictions and highlight the atoms involved in each reaction step101. Attention-based graph convolutional neural networks have also been used for the prediction of solubility, polarity, synthetic accessibility and photovoltaic efficiency, among other properties102,103, leading to the identification of relevant molecular substructures for the target properties. Finally, attention-based graph architectures have also been used in chemical reactivity prediction104, pointing to structural motifs that are consistent with a chemist’s intuition in the identification of suitable reaction partners and activating reagents.

Due to their intuitive connection with the two-dimensional representation of molecules, graph-convolution-based XAI bears the potential of being applicable to several other common modelling tasks in drug discovery. In the authors’ opinion, XAI for graph convolution might be mostly beneficial to applications aimed at finding relevant molecular motifs, for example, structural alert identification and site of reactivity or metabolism prediction.

Self-explaining approaches

The XAI methods introduced so far produce a posteriori explanations of deep learning models. Although such post hoc interpretations have been shown to be useful, some argue that, ideally, XAI methods, should automatically offer human-interpretable explanation alongside their predictions105. Such approaches (herein referred to as ‘self-explaining’) would promote verification and error analysis, and be directly linkable with domain knowledge106. While the term self-explaining has been coined to refer to a specific neural network architecture—self-explaining neural networks106, described below—in this Review, the term is used in a broader sense, to identify methods that feature interpretability as a central part of their design. Self-explaining XAI approaches can be grouped into the following categories.

  • Prototype-based reasoning refers to the task of forecasting future events (that is, novel samples) based on particularly informative known data points. Usually, this is done by identifying prototypes, that is, representative samples, which are adapted (or used directly) to make a prediction. These methods are motivated by the fact that predictions based on individual, previously seen examples mimic human decision-making107. The Bayesian case model108 is a pre-deep-learning approach that constitutes a general framework for such prototype-based reasoning. A Bayesian case model learns to identify observations that best represent clusters in a dataset (that is, prototypes), along with a set of defining features for that cluster. Joint inference is performed on cluster labels, prototypes and extracted relevant features, thereby providing interpretability without sacrificing classification accuracy108. Recently, Li et al.109 developed a neural network architecture composed of an autoencoder and a therein named ‘prototype layer’, whose units store a learnable weight vector representing an encoded training input. Distances between the encoded latent space of new inputs and the learned prototypes are then used as part of the prediction process. This approach was later expanded by Chen et al.110 to convolutional neural networks for computer vision tasks.

  • Self-explaining neural networks106 aim to associate input or latent features with semantic concepts. They jointly learn a class prediction and generate explanations using a feature-to-concept mapping. Such a network model consists of (1) a subnetwork that maps raw inputs into a predefined set of explainable concepts, (2) a parameterizer that obtains coefficients for each individual explainable concept and (3) an aggregation function that combines the output of the previous two components to produce the final class prediction.

  • Human-interpretable concept learning refers to the task of learning a class of concepts, that is, high-level combinations of knowledge elements111, from data, aiming to achieve human-like generalization ability. The Bayesian programme learning approach112 was proposed with the goal of learning visual concepts in computer vision tasks. Such concepts were represented as probabilistic programmes expressed as structured procedures in an abstract description language113. The model then composes more complex programmes using the elements of previously learned ones using a Bayesian criterion. This approach has been shown to reach human-like performance in one-shot learning tasks114,115.

  • Testing with concept activation vectors116 computes the directional derivatives of the activations of a layer with respect to its input, towards the direction of a concept. Such derivatives quantify the degree to which the latter is relevant for a particular classification (for example, how important the concept ‘stripes’ is for the prediction of the class ‘zebra’). It does so by considering the mathematical association between the internal state of a machine learning model—seen as a vector space Em spanned by basis vectors em that correspond to neural activations—and human-interpretable activations residing in a different vector space Eh spanned by basis vectors eh. A linear function is computed that translates between these vector spaces (g: EmEh). The association is achieved by defining a vector in the direction of the values of a concept’s set of examples, and then training a linear classifier between those and random counterexamples, to finally take the vector orthogonal to the decision boundary.

  • Natural language explanation generation. Deep networks can be designed to generate human-understandable explanations in a supervised manner117. In addition to minimizing the loss of the main modelling task, several approaches synthesize a sentence using natural language that explains the decision performed by the model, by simultaneously training generators on large datasets of human-written explanations. This approach has been applied to generate explanations that are both image and class relevant118. Another prominent application is visual question answering119. To obtain meaningful explanations, however, this approach requires a substantial amount of human-curated explanations for training, and might, thus, find limited applicability in drug discovery tasks.

Self-explaining methods possess several desirable aspects of XAI, but in particular we highlight their intrinsic transparency. By incorporating human-interpretable explanations at the core of their design, they avoid the common need of a post hoc interpretation methodology. The produced human-intelligible explanations might also provide natural insights on the justification of the provided predictions.

To the best of our knowledge, self-explaining deep learning has not been applied to chemistry or drug design yet. Including interpretability by design could help bridge the gap between machine representation and the human understanding of many types of problems in drug discovery. For instance, prototype-reasoning bears promise in the modelling of heterogeneous sets of chemicals with different modes of action, allowing the preservation of both mechanistic interpretability and predictive accuracy. Explanation generation (either concept or text based) is another potential solution to include human-like reasoning and domain knowledge in the model building task. In particular, explanation-generation approaches might be applicable to certain decision-making processes, such as the replacement of animal testing and in vitro to in vivo extrapolation, where human-understandable generated explanations constitute a crucial element.

Uncertainty estimation

Uncertainty estimation, that is, the quantification of errors in a prediction, constitutes another approach to model interpretation. While some machine learning algorithms, such as Gaussian processes120, provide built-in uncertainty estimation, deep neural networks are known for being poor at quantifying uncertainty121. This is one of the reasons why several efforts have been devoted to specifically quantify uncertainty in neural network-based predictions. Uncertainty estimation methods can be grouped into the following categories.

  • Ensemble approaches. Model ensembles improve the overall prediction quality and have become a standard for uncertainty estimates122. Deep ensemble averaging123 is based on m identical neural network models that are trained on the same data and with a different initialization. The final prediction is obtained by aggregating the predictions of all models (for example, by averaging), while an uncertainty estimate can be obtained from the respective variance (Fig. 4a). Similarly, the sets of data on which these models are trained can be generated via bootstrap re-sampling124. A disadvantage of this approach is its computational demand, as the underlying methods build on m independently trained models. Snapshot ensembling125 aims to overcome this limitation by periodically storing model states (that is, model parameters) along the training optimization path. These model ‘snapshots’ can be then used for constructing the ensemble.

    Fig. 4: Uncertainty estimation.
    figure 4

    a, Ensemble-based methods aggregate the output of m identical, but differently initialized, models fi. The final prediction is obtained by aggregating the predictions of all models (for example, as the average, \(\bar y\)), while an uncertainty estimate can be obtained from the respective predictive variance, for example, in the form of a standard deviation, s.d.(y). b, Bayesian probabilistic approaches consider a prior p(θ) over the learnable weights of a neural network model fθ, and make use of approximate sampling approaches to learn a posterior distribution over both the weights p(θ|x) and the prediction pθ(y|x). These distributions can be then sampled from to obtain uncertainty estimates over both the weights and the predictions.

  • Probabilistic approaches aim to estimate the posterior probability of a certain model output or to perform post hoc calibration. Many of these methods treat neural networks as Bayesian models, by considering a prior distribution over its learnable weights, and then performing inference over their posterior distribution with various methods (Fig. 4b), for example, Markov chain Monte Carlo126 or variational inference127,128. Gal et al.129 suggested the usage of dropout regularization to perform approximate Bayesian inference, which was later extended130 to compute epistemic (that is, caused by model mis-specification) and aleatoric uncertainty (inherent to the noise in the data). Similar approximations can also be made via batch normalization131. Mean variance estimation132 considers a neural network designed to output both a mean and variance value, to then train the model using a negative Gaussian log-likelihood loss function. Another subcategory of approaches consider asymptotic approximations of a prediction by making Gaussian distributional assumptions of its error, such as the delta technique133,134.

  • Other approaches. The lower upper bound estimation (LUBE)135 approach trains a neural network with two outputs, corresponding to the upper and lower bounds of the prediction. Instead of quantifying the error of single predictions, LUBE uses simulated annealing and optimizes the model coefficients to achieve (1) maximum coverage (probability that the real value of the ith sample will fall between the upper and the lower bound) of training measurements and (2) minimum prediction interval width. Ak et al. suggested to quantify the uncertainty in neural network models by directly modelling interval-valued data136. Trust scores137 measure the agreement between a neural network and a k-nearest neighbour classifier that is trained on a filtered subset of the original data. The trust score considers both the distance between the instance of interest to the nearest class that is different from the original predicted one and its distance towards the predicted class. Union-based methods138 first train a neural network model and then feed its embeddings to a second model that handles uncertainty, such as a Gaussian process or a random forest. Distance-based approaches139 aim to estimate the prediction uncertainty of a new sample x′ by measuring the distance to the closest sample in the training set, either using input features140 or an embedding produced by the model141.

Uncertainty is omnipresent in the natural sciences, and errors can arise from several sources. The methods described in this Review mainly address the epistemic error, that is, uncertainty in the model and hyperparameter choice. However, the aleatoric error, that is, the intrinsic randomness related to the inherent noise in experimental data, is independent from in silico modelling. It should be noted that this distinction of error types is usually not taken into consideration in practice because these two types of error are often inseparable. Accurately quantifying both types of error, however, could potentially increase the value of the information provided to medicinal chemists in active learning cycles, and facilitate decision-making during the compound optimization process74.

Uncertainty estimation approaches have been successfully implemented in drug discovery applications142, mostly in traditional QSAR modelling, either by the use of models that naturally handle uncertainty143 or post hoc methods144,145. Attention has recently been drawn towards the development of uncertainty-aware deep learning applications in the field. Snapshot ensembling was applied to model 24 bioactivity datasets146, showing that it performs on par with random forest and neural network ensembles, and also leads to narrower confidence intervals. Schwaller et al.147 proposed a transformer model77 for the task of forward chemical reaction prediction. This approach implements uncertainty estimation by computing the product of the probabilities of all predicted tokens in a SMILES sequence representing a molecule. Zhang et al.148 have recently proposed a Bayesian treatment of a semi-supervised graph neural network for uncertainty-calibrated predictions of molecular properties, such as the melting point and aqueous solubility. Their results suggest that this approach can efficiently drive an active learning cycle, particularly in the low-data regime—by choosing those molecules with the largest estimated epistemic uncertainty. Importantly, a recent comparison of several uncertainty estimation methods for physicochemical property prediction showed that none of the methods systematically outperformed all others149.

Often, uncertainty estimation methods are applied alongside models that are difficult to interpret, due to the utilized algorithms, molecular descriptors or a combination of both. Importantly, however, uncertainty estimation alone does not necessarily avert several known issues of deep learning, such as a model producing the right answer for unrelated or wrong reasons or highly reliable but wrong predictions31,32. Thus, enriching uncertainty estimation with concepts of transparency or justification remains a fundamental area of research to maximize the reliability and effectiveness of XAI in drug discovery.

Available software

Given the attention deep learning applications are currently receiving, several software tools have been developed to facilitate model interpretation. A prominent example is Captum150, an extension of the PyTorch151 deep learning and automatic differentiation package that provides support for most of the feature attribution techniques described in this work. Another popular package is Alibi152, which provides instance-specific explanations for certain models trained with the scikit-learn153 or TensorFlow154 packages. Some of the explanation methods implemented include anchors, contrastive explanations and counterfactual instances.

Conclusions and outlook

In the context of drug discovery, full comprehensibility of deep learning models may be hard to achieve38, although the provided predictions can still prove useful to the practitioner. When striving for interpretations that match the human intuition, it will be crucial to carefully devise a set of control experiments to validate the machine-driven hypotheses and increase their reliability and objectivity40.

Current XAI also faces technical challenges, given the multiplicity of possible explanations and methods applicable to a given task155. Most approaches do not come as readily usable, ‘out-of-the-box’ solutions, but need to be tailored to each individual application. In addition, profound knowledge of the problem domain is crucial to identify which model decisions demand further explanations, which type of answers are meaningful to the user and which are instead trivial or expected156. For human decision-making, the explanations generated with XAI have to be non-trivial, non-artificial and sufficiently informative for the respective scientific community. At least for the time being, finding such solutions will require the collaborative effort of deep-learning experts, chemoinformaticians and data scientists, chemists, biologists and other domain experts, to ensure that XAI methods serve their intended purpose and deliver reliable answers.

It will be of particular importance to further explore the opportunities and limitations of the established chemical language for representing the decision space of these models. One step forward is to build on interpretable ‘low level’ molecular representations that have direct meaning for chemists and are suited for machine learning (for example, SMILES strings157,158, amino acid sequences159,160 and spatial three-dimensional voxelized representations73,161). Many recent studies rely on well-established molecular descriptors, such as hashed binary fingerprints162,163 and topochemical and geometrical descriptors164,165, which capture structural features defined a priori. Often, molecular descriptors, while being relevant for subsequent modelling, capture intricate chemical information. Consequently, when striving for XAI, there is an understandable tendency to employ molecular representations that can be more easily rationalized in terms of the known language of chemistry. Model interpretability depends on both the chosen molecular representation and the chosen machine learning approach40. With that in mind, the development of novel interpretable molecular representations for deep learning will constitute a critical area of research for the years to come, including the development of self-explaining approaches to overcome the hurdles of non-interpretable but information-rich descriptors, by providing human-like explanations alongside sufficiently accurate predictions.

Due to the current lack of methods comprising all of the outlined desirable features of XAI (transparency, justification, informativeness and uncertainty estimation), a major role in the short and mid term will be played by consensus (jury) approaches that combine the strengths of individual (X)AI approaches and increase model reliability. In the long run, jury XAI approaches—by relying on different algorithms and molecular representations—will constitute a way to provide multifaceted vantage points of the modelled biochemical process. Most of the deep learning models in drug discovery currently do not consider applicability domain restrictions166,167, that is, the region of chemical space where statistical learning assumptions are met. These restrictions should, in the authors’ opinion, be considered an integral element of XAI, as their assessment and a rigorous evaluation of model accuracy has proven to be more relevant for decision-making than the modelling approach itself168. Knowing when to apply which particular model will probably help address the problem of high confidence of deep learning models on wrong predictions121 and avoid unnecessary extrapolations at the same time. Along those lines, in time- and cost-sensitive scenarios, such as drug discovery, deep learning practitioners have the responsibility to cautiously inspect and interpret the predictions derived from their modelling choices. Keeping in mind the current possibilities and limitations of XAI in drug discovery, it is reasonable to assume that the continued development of mixed approaches and alternative models that are more easily comprehensible and computationally affordable will not lose its importance.

At present, XAI in drug discovery lacks an open-community platform for sharing and improving software, model interpretations and the respective training data by synergistic efforts of researchers with different scientific backgrounds. Initiatives such as MELLODDY (Machine Learning Ledger Orchestration for Drug Discovery, melloddy.eu) for decentralized, federated model development and secure data handling across pharmaceutical companies constitute a first step in the right direction. Such kinds of collaboration will hopefully foster the development, validation and acceptance of XAI and the associated explanations these tools provide.