Antibodies are an essential class of therapeutics but low breadth or off-target binding are major concerns for antibody–drug efficiency and safety. To predict which targets an antibody can neutralize, a machine learning pipeline based on an adaptive graph convolutional network architecture is proposed that learns the binding landscape of antibodies to multiple mutated viruses at the same time.
Tools and machine learning (ML) formalizations for antibody binding prediction are scarce. Despite recent advances in protein or antibody structure modelling1,2, predicting antibody binding to an antigen remains extremely challenging, even for Alphafold2 (refs. 3,4), which relies on the availability of evolutionary information. The performance of de novo prediction of new antibodies has been poor so far, and methods for adapting successful protein ML paradigms to small-sized antibody datasets that lack evolutionary information5 are critically needed. Furthermore, the required type and size of structural, sequential or affinity-based datasets for accurate binding prediction is still unknown6, which leads to inefficient experimental data generation. Recent hybrid sequence and structure ML formalisms such as binding residues prediction7 or screening compatible antibody–antigen modelled structures8 have suggested that antibody binding prediction is feasible with current types of datasets, in a protein–protein interaction (PPI) setting where unrelated binding pairs are shuffled to create non-binders. Zhang et al.9 demonstrate a new adaptive architecture to predict the neutralization landscape of many antibodies to many antigen variants. Such new approaches and ML formalizations are needed to learn the global binding landscape of antibody and antigen variants, and to transition from the restricted neighbourhood of known bindings towards generalizable binding.
Zhang et al. developed a graph convolutional network (GCN)-based architecture (termed DeepAAI), that can predict the neutralization capacity of completely unknown antibodies during training (Fig. 1). This capability differs from typical PPI prediction methods that already include all known antibodies in both training and test datasets, albeit with different binding partners. Prediction involves either binary classification or regression for the antibodies’ IC50 neutralization score (IC50 is a standard quantitative measure for the potency of a molecule to inhibit a biochemical function). DeepAAI matches antibody sequences to nodes in a graph-based functional latent space. A flexible GCN takes this graph as input. A new antibody can be assigned a new node, and the GCN predicts its property after being trained on the other nodes, akin to a transfer learning scheme. Zhang et al. therefore answer the important research question: given a latent space of the neutralization landscape of a set of antibodies to multiple antigens, is it possible to project new antibodies in this latent space and learn which targets they neutralize?
Previous studies relied on sequence similarity in their loss function to build latent spaces10,11,12,13, but it has been shown this is not a good surrogate to binding on benchmarking simulated datasets6 and potentially disregards dissimilar sequences with shared structure or binding landscape14, which other structural-based methods would have identified15. By contrast, DeepAAI9 builds a function-based internal representation, potentially clustering sequences based on binding rules. Actually, ML models that predict antibody structure1,4 would also likely develop predictive functional latent representations of antibodies, but they have not yet been fine-tuned to antibody–antigen binding or neutralization. Interestingly, the authors show that the latent space clusters antibodies that tend to recognize the same immunogenic regions, which supports a successful functional clustering. Such latent space enables the generation of new antibodies of desired properties, and could reveal interpretable binding rules such as predictive motifs for cross-reactivity. The latent space can also be fine-tuned to predict the binding residues at the antibody–antigen binding interface (the paratope and epitope).
It might seem obvious that ML models use unseen data for testing. Yet, the authors achieve a bigger jump than usual in view of ‘data leakage’, which denotes the existence of shared information between training and test datasets. Sequence similarity is a first line of data leakage: antibodies similar to a training instance might be predicted well without learning generalizable rules. By separating 90% sequence-similar sequences from train and test datasets, and by using unseen antibodies for prediction, DeepAAI shows that binding can be transferred beyond two levels of data leakage owing to known properties of the same antibody to other targets that could be present in a PPI formalism. However, more advanced types of data leakage might exist: if an antibody in the training set has a similar binding profile to the new tested antibody, the model may be predicting correctly without necessarily learning rules, in which case the model learns a multiclass problem where any explored neutralization pattern describes a class. A stress test of DeepAAI with adversarial simulated datasets6 of antibodies sharing binding rules but not the binding landscape would likely help, and would inform how far DeepAAI latent space can represent complex binding profiles.
Provocatively, one could say sequence neutralization datasets may replace structural datasets. Interestingly, sequence positional information was not required for neutralization prediction owing to the use of k-mer sequence representation and the use of a convolutional neural network (CNN), while position specific scoring matrices were needed for IC50 regression prediction, which shows the potential implication of hidden structural information. The question therefore remains where the key information comes from. Does DeepAAI really infer interpretable binding rules, or does it rely on the existence of other antibodies with similar neutralization landscape in the training dataset? DeepAAI predictions were transferable from HIV to the influenza virus and to COVID-19 datasets, which supports the finding that neutralization rules might be predictable with smaller datasets than previously expected. Whether sequence datasets will ultimately prevail or not, binding structures will remain the gold standard to check predictions, especially at the paratope–epitope level. If DeepAAI latent space represents structural binding modes, then each functional cluster might be described by one or few experimental structures, and DeepAAI can inform which experimental structures are missing, and which ones would be redundant to generate, therefore helping to prioritize expensive antibody–antigen structure measurements.
Finally, structural datasets have not yet captured many cases of antibody cross-reactivity to multiple antigens. As a result, current ML methods may have been skewed to ‘one antibody, one target’ formulations, where improving affinity has shadowed the hard task of decreasing off-targets16. Zhang et al.’s formulation enables the exploration of the antibody specificity landscape, which is necessary to investigate which antibodies are cross-reactive in the context of off-target antigen recognition. This capability could prove useful for developing safe immunotherapies.
Ruffolo, J. A., Chu, L.-S., Mahajan, S. P. & Gray, J. J. Preprint at Biophys. J. 121, 155a–156a (2022).
Lin, Z. et al. Preprint at bioRxiv https://doi.org/10.1101/2022.07.20.500902 (2022).
Yin, R., Feng, B. Y., Varshney, A. & Pierce, B. G. Protein Sci. 31, e4379 (2022).
Abanades, B., Georges, G., Bujotzek, A. & Deane, C. M. Bioinformatics 38, 1877–1880 (2022).
Chowdhury, R. et al. Nat. Biotechnol. https://doi.org/10.1038/s41587-022-01432-w (2022).
Robert, P. A. et al. Nat. Computat. Sci. (in the press).
Pittala, S. & Bailey-Kellogg, C. Bioinformatics 36, 3996–4003 (2020).
Schneider, C., Buchanan, A., Taddese, B. & Deane, C. M. Bioinformatics 38, 377–383 (2021).
Zhang, J. et al. Nat. Mach. Intell. 4, 964–976 (2022).
Friedensohn, S. et al. Preprint at bioRxiv https://doi.org/10.1101/2020.02.25.965673 (2020).
Akbar, R. et al. MAbs 14, 2031482 (2022).
Ruffolo, J. A., Sulam, J. & Gray, J. J. Patterns (NY) 3, 100406 (2021).
Leem, J., Mitchell, L. S., Farmery, J. H. R., Barton, J. & Galson, J. D. Deciphering the language of antibodies using self-supervised learning. Patterns (NY) 3, 100513 (2022).
Mason, D. M. et al. Nat. Biomed. Eng. 5, 600–612 (2021).
Richardson, E. et al. MAbs 13, 1869406 (2021).
Cunningham, O., Scott, M., Zhou, Z. S. & Finlay, W. J. J. MAbs 13, 1999195 (2021).
P.A.R.’s current postdoctoral position at University of Basel was funded by Hoffmann-La Roche, Basel. V.G. declares advisory board positions in aiNET GmbH, Enpicom B.V, Specifica Inc, Adaptyv Biosystems, EVQLV, Omniscope, Diagonal Therapeutics, and Absci. V.G. is a consultant for Roche/Genentech, immunai, and Proteinea.
About this article
Cite this article
Robert, P.A., Greiff, V. Bridging the neutralization gap for unseen antibodies. Nat Mach Intell 5, 8–10 (2023). https://doi.org/10.1038/s42256-022-00594-1