Abstract
Host–pathogen interactions and pathogen evolution are underpinned by protein–protein interactions between viral and host proteins. An understanding of how viral variants affect protein–protein binding is important for predicting viral–host interactions, such as the emergence of new pathogenic SARS-CoV-2 variants. Here we propose an artificial intelligence-based framework called UniBind, in which proteins are represented as a graph at the residue and atom levels. UniBind integrates protein three-dimensional structure and binding affinity and is capable of multi-task learning for heterogeneous biological data integration. In systematic tests on benchmark datasets and further experimental validation, UniBind effectively and scalably predicted the effects of SARS-CoV-2 spike protein variants on their binding affinities to the human ACE2 receptor, as well as to SARS-CoV-2 neutralizing monoclonal antibodies. Furthermore, in a cross-species analysis, UniBind could be applied to predict host susceptibility to SARS-CoV-2 variants and to predict future viral variant evolutionary trends. This in silico approach has the potential to serve as an early warning system for problematic emerging SARS-CoV-2 variants, as well as to facilitate research on protein–protein interactions in general.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Data availability
All input datasets are freely available from public sources.
Code availability
The deep-learning models were developed and deployed using standard model libraries and the PyTorch framework. Custom codes were specific to our development environment and used primarily for data input/output and parallelization across computers and graphics processors. Code is available at GitHub (https://github.com/UniBind/UniBind).
References
Menachery, V. D. et al. A SARS-like cluster of circulating bat coronaviruses shows potential for human emergence. Nat. Med. 21, 1508–1513 (2015).
Starr, T. N. et al. ACE2 binding is an ancestral and evolvable trait of sarbecoviruses. Nature 603, 913–918 (2022).
Walls, A. C. et al. Structure, function, and antigenicity of the SARS-CoV-2 spike glycoprotein. Cell 181, 281–292.e286 (2020).
Korber, B. et al. Tracking changes in SARS-CoV-2 spike: evidence that D614G increases infectivity of the COVID-19 virus. Cell 182, 812–827.e819 (2020).
Thomson, E. C. et al. Circulating SARS-CoV-2 spike N439K variants maintain fitness while evading antibody-mediated immunity. Cell 184, 1171–1187.e1120 (2021).
Hill, V. et al. The origins and molecular evolution of SARS-CoV-2 lineage B.1.1.7 in the UK. Virus Evol. 8, veac080 (2022).
Mlcochova, P. et al. SARS-CoV-2 B.1.617.2 Delta variant replication and immune evasion. Nature 599, 114–119 (2021).
Viana, R. et al. Rapid epidemic expansion of the SARS-CoV-2 Omicron variant in southern Africa. Nature 603, 679–686 (2022).
Martin, D. P. et al. Selection analysis identifies clusters of unusual mutational changes in Omicron lineage BA.1 that likely impact spike function. Mol. Biol. Evol. 39, msac061 (2022).
Wang, Q. et al. Alarming antibody evasion properties of rising SARS-CoV-2 BQ and XBB subvariants. Cell https://doi.org/S0092867422015318 (2022).
Yue, C. et al. ACE2 binding and antibody evasion in enhanced transmissibility of XBB.1.5. Lancet Infect. Dis. 23, 278–280 (2023).
Mannar, D. et al. SARS-CoV-2 Omicron variant: antibody evasion and cryo-EM structure of spike protein-ACE2 complex. Science 375, 760–764 (2022).
Cao, Y. et al. Omicron escapes the majority of existing SARS-CoV-2 neutralizing antibodies. Nature 602, 657–663 (2022).
Jankauskaitė, J., Jiménez-García, B., Dapkūnas, J., Fernández-Recio, J. & Moal, I. H. SKEMPI 2.0: an updated benchmark of changes in protein–protein binding energy, kinetics and thermodynamics upon mutation. Bioinformatics 35, 462–469 (2019).
Taft, J. M. et al. Deep mutational learning predicts ACE2 binding and antibody escape to combinatorial mutations in the SARS-CoV-2 receptor-binding domain. Cell 185, 4008–4022.e4014 (2022).
Wang, B. & Gamazon, E. R. Modeling mutational effects on biochemical phenotypes using convolutional neural networks: application to SARS-CoV-2. iScience 25, 104500 (2022).
Elbe, S. & Buckland-Merrett, G. Data, disease and diplomacy: GISAID’s innovative contribution to global health. Glob. Chall. 1, 33–46 (2017).
Hie, B., Zhong, E. D., Berger, B. & Bryson, B. Learning the language of viral evolution and escape. Science 371, 284–288 (2021).
Maher, M. C. et al. Predicting the mutational drivers of future SARS-CoV-2 variants of concern. Sci. Transl. Med. 14, eabk3445 (2022).
Obermeyer, F. et al. Analysis of 6.4 million SARS-CoV-2 genomes identifies mutations associated with fitness. Science 376, 1327–1332 (2022).
Rodriguez-Rivas, J., Croce, G., Muscat, M. & Weigt, M. Epistatic models predict mutable sites in SARS-CoV-2 proteins and epitopes. Proc. Natl Acad. Sci. USA 119, e2113118119 (2022).
Beguir, K. et al. Early computational detection of potential high risk SARS-CoV-2 variants. Comput. Biol. Med. 155, 106618 (2023).
Chan, K. K. et al. Engineering human ACE2 to optimize binding to the spike protein of SARS coronavirus 2. Science 369, 1261–1265 (2020).
Starr, T. N. et al. Shifting mutational constraints in the SARS-CoV-2 receptor-binding domain during viral evolution. Science 377, 420–424 (2022).
Starr, T. N. et al. Deep mutational scanning of SARS-CoV-2 receptor binding domain reveals constraints on folding and ACE2 binding. Cell 182, 1295–1310.e1220 (2020).
Esposito, D. et al. MaveDB: an open-source platform to distribute and interpret data from multiplexed assays of variant effect. Genome Biol. 20, 223 (2019).
Han, P. et al. Receptor binding and complex structures of human ACE2 to spike RBD from omicron and delta SARS-CoV-2. Cell 185, 630–640.e610 (2022).
Han, P. et al. Molecular insights into receptor binding of recent emerging SARS-CoV-2 variants. Nat. Commun. 12, 6103 (2021).
Higuchi, Y. et al. Engineered ACE2 receptor therapy overcomes mutational escape of SARS-CoV-2. Nat. Commun. 12, 3802 (2021).
Liu, L. et al. Striking antibody evasion manifested by the Omicron variant of SARS-CoV-2. Nature 602, 676–681 (2022).
Wang, R. et al. Analysis of SARS-CoV-2 variant mutations reveals neutralization escape mechanisms and the ability to use ACE2 receptors from additional species. Immunity 54, 1611–1621.e1615 (2021).
Rodrigues, C. H. M., Myung, Y., Pires, D. E. V. & Ascher, D. B. mCSM-PPI2: predicting the effects of mutations on protein-protein interactions. Nucleic Acids Res. 47, W338–W344 (2019).
Pires, D. E. V. & Ascher, D. B. mCSM-AB: a web server for predicting antibody-antigen affinity changes upon mutation with graph-based signatures. Nucleic Acids Res. 44, W469–W473 (2016).
Xiong, P., Zhang, C., Zheng, W. & Zhang, Y. BindProfX: assessing mutation-induced binding affinity change by protein interface profiles with pseudo-counts. J. Mol. Biol. 429, 426–434 (2017).
Yang, B., Li, K., Zhong, X. & Zou, J. Implementation of deep learning in drug design. MedComm Future Med. 1, e18 (2022).
Bhattacharjee, M. J. et al. Identifying primate ACE2 variants that confer resistance to SARS-CoV-2. Mol. Biol. Evol. 38, 2715–2731 (2021).
Ye, F. et al. S19W, T27W, and N330Y mutations in ACE2 enhance SARS-CoV-2 S-RBD binding toward both wild-type and antibody-resistant viruses and its molecular basis. Signal Transduct. Target. Ther. 6, 343 (2021).
Damas, J. et al. Broad host range of SARS-CoV-2 predicted by comparative and structural analysis of ACE2 in vertebrates. Proc. Natl Acad. Sci. USA 117, 22311–22322 (2020).
Shi, J. et al. Susceptibility of ferrets, cats, dogs, and other domesticated animals to SARS-coronavirus 2. Science 368, 1016–1020 (2020).
Oude Munnink, B. B. et al. Transmission of SARS-CoV-2 on mink farms between humans and mink and back to humans. Science 371, 172–177 (2021).
Cameroni, E. et al. Broadly neutralizing antibodies overcome SARS-CoV-2 Omicron antigenic shift. Nature 602, 664–670 (2022).
Hong, Q. et al. Molecular basis of receptor binding and antibody neutralization of Omicron. Nature 604, 546–552 (2022).
Meng, B. et al. Altered TMPRSS2 usage by SARS-CoV-2 Omicron impacts infectivity and fusogenicity. Nature 603, 706–714 (2022).
Triveri, A. et al. SARS-CoV-2 spike protein mutations and escape from antibodies: a computational model of epitope loss in variants of concern. J. Chem. Inf. Model. 61, 4687–4700 (2021).
Gruell, H. et al. Neutralisation sensitivity of the SARS-CoV-2 omicron BA.2.75 sublineage. Lancet Infect. Dis. 22, 1422–1423 (2022).
Wu, L. et al. SARS-CoV-2 Omicron RBD shows weaker binding affinity than the currently dominant Delta variant to human ACE2. Signal Transduct. Target. Ther. 7, 8 (2022).
Imai, M. et al. Efficacy of antiviral agents against Omicron subvariants BQ.1.1 and XBB. N. Engl. J. Med. 388, 89–91 (2023).
Cao, Y. et al. BA.2.12.1, BA.4 and BA.5 escape antibodies elicited by Omicron infection. Nature 608, 593–602 (2022).
Callaway, E. How months-long COVID infections could seed dangerous new variants. Nature 606, 452–455 (2022).
Sonnleitner, S. T. et al. Cumulative SARS-CoV-2 mutations and corresponding changes in immunity in an immunocompromised patient indicate viral evolution within the host. Nat. Commun. 13, 2560 (2022).
Rives, A. et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc. Natl Acad. Sci. USA 118, e2016239118 (2021).
Lyngse, F. P. et al. Household transmission of SARS-CoV-2 Omicron variant of concern subvariants BA.1 and BA.2 in Denmark. Nat. Commun. 13, 5760 (2022).
Jian, F. et al. Further humoral immunity evasion of emerging SARS-CoV-2 BA.4 and BA.5 subvariants. Lancet Infect. Dis. 22, 1535–1537 (2022).
Yamasoba, D. et al. Virological characteristics of the SARS-CoV-2 Omicron BA.2 spike. Cell 185, 2103–2115.e19 (2022).
Cheng, S. M. S. et al. Neutralizing antibodies against the SARS-CoV-2 Omicron variant BA.1 following homologous and heterologous CoronaVac or BNT162b2 vaccination. Nat. Med. 28, 486–489 (2022).
Iketani, S. et al. Antibody evasion properties of SARS-CoV-2 Omicron sublineages. Nature 604, 553–556 (2022).
Yu, J. et al. Neutralization of the SARS-CoV-2 Omicron BA.1 and BA.2 variants. N. Engl. J. Med. 386, 1579–1580 (2022).
Andrews, N. et al. Covid-19 vaccine effectiveness against the Omicron (B.1.1.529) variant. N. Engl. J. Med. 386, 1532–1546 (2022).
Cao, Y. et al. Imprinted SARS-CoV-2 humoral immunity induces convergent Omicron RBD evolution. Nature 614, 521–529 (2023).
Hadfield, J. et al. Nextstrain: real-time tracking of pathogen evolution. Bioinformatics 34, 4121–4123 (2018).
Sarfati, H., Naftaly, S., Papo, N. & Keasar, C. Predicting mutant outcome by combining deep mutational scanning and machine learning. Proteins 90, 45–57 (2022).
Huang, X., Pearce, R. & Zhang, Y. EvoEF2: accurate and fast energy function for computational protein design. Bioinformatics 36, 1135–1142 (2020).
Xia, K. & Wang, J. Recent advances of transformers in medical image analysis: a comprehensive review. MedComm Future Med. 2, e38 (2023).
Gao, Y., Zhan, J. & Yu, A. C. H., Yu. Understanding by design: implementing deep learning from protein structure prediction to protein design. MedComm Future Med. 1, e22 (2022).
Barlow, K. A. et al. Flex ddG: Rosetta Ensemble-based estimation of changes in protein-protein binding affinity upon mutation. J. Phys. Chem. B 122, 5389–5399 (2018).
Wehenkel, A. & Louppe, G. Unconstrained monotonic neural networks. Preprint at https://arxiv.org/abs/1908.05164 (2021).
Shan, S. et al. Deep learning guided optimization of human antibody against SARS-CoV-2 variants with broad neutralization. Proc. Natl Acad. Sci. USA 119, e2122954119 (2022).
Ulrich, L. et al. Enhanced fitness of SARS-CoV-2 variant of concern Alpha but not Beta. Nature 602, 307–313 (2022).
Acknowledgements
This study was funded by the National Natural Science Foundation of China (grant no. 62272055), the New Cornerstone Science Foundation through the XPLORER PRIZE, Young Elite Scientists Sponsorship Program by CAST (grant no. 2021QNRC001), the Major Key Project of PCL (grant no. PCL2021A15), Guangzhou National Laboratory, Macau University of Science and Technology, the Macau Antibody Protection Study (MAPS) and the Macau Science and Technology Development Fund (grant nos. 0007/2020/AFJ, 0070/2020/A2, 0109/2020/A3 and 0003/2021/AKP).
Author information
Authors and Affiliations
Contributions
X.L., K.W., Y.G., G.L., D.T.B.-H., X.H.Y., K.X., W.H.T., Z.J., L.C., M.F., J.Y.-N.L., S.Y., L.L., P.Z., G.W. and K.Z. collected and/or analyzed the data. K.Z. and G.W. conceived and supervised the project. K.Z., X.L., J.Y.-N.L., Y.G., X.H.Y. and G.W. wrote and/or revised the paper. All authors discussed the results and reviewed the paper.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Medicine thanks Eric Gamazon and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Michael Basson, in collaboration with the Nature Medicine team.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Architectural details of the geometry and energy attention module.
Arrows show the information flow. Vector names: XS: input sequence feature, XE: input energy feature, XG: input geometric feature, XN: neighbor sequence feature, TN: neighbor translation, T: local transition, R: neighbor translation, \({X}_{S}^{{\prime} }\): output sequence feature. L: length of the amino acid sequence, LN: No. of nearest neighbor residues in the graph, Nh: No. of heads in the multi-head attention. Dimension names: dS: sequence feature, dE: energy feature, dG: geometric feature, dQ: query vector, dK: key vector, dV: value vector. Operators: ⊙: element-wise multiplication, ⊕: element-wise addition. Functions: k−NN: search for k-nearest neighbors, \(\text{Linear}\) \({d}_{1}\to {d}_{2}\): fully connected layer with an input dimension of d1 and an output dimension of d2, \(f\left(x\right):3\to {d}_{G}\): geometric feature function, specifically, \(f\left(x\right)={concat}(x,{\vec{n}}_{x},{||x|}{|}_{2})\) in BindFormer, where \({\vec{n}}_{x}\) is the normal vector of x.
Extended Data Fig. 2 Validation of the AI’s performance on protein complex affinity prediction.
a-b, mean absolute error of between calculated and experimental values of changes in binding affinity in SKEMPI V2.0, grouped by a, the number of mutations (1 to 7+ mutations, n = 4069, 796, 287, 218, 116, 59 and 184, respectively; error bar denotes standard deviation); and b, by structure’s resolution. c-d, Regression analysis between predicted scores and experimental scores in MaveDB. c, Performance on cohesin-dockerin binding score in Clostridium cellulolyticum, d, Performance on PSD95 protein binding score with small peptide ligand. e, Performance comparison in MaveDB. f, Regression performance between predicted scores and experimental scores on IGBPG (immunoglobulin G-binding β1 domain of streptococcal protein G) dataset. Error bar, standard deviation.
Extended Data Fig. 3 UniBind performance on measuring effects on ACE2 binding of mutations to SARS-CoV-2 RBD.
a-d, Regression performance of affinity prediction of RBD mutation effects on different SARS-CoV-2 variants including a, Alpha (N501Y), b, Beta (K417N+E484K+N501Y), c, Delta (L452R+T478K), d, Eta (E484K). MAE, mean absolute error; R2, coefficient of determination; PCC, Pearson’s correlation coefficient.
Extended Data Fig. 4 Prediction of effects on antibody binding of mutations to SARS-CoV-2 RBD.
a–d, Stratified analysis of regression performance on 4 classes of neutralization antibodies which were grouped according to Cao et al. e, f, Heatmap of experimental (e) and predicted (f) escape score matrix upon mutations of RBD to different antibodies. Brightness represents the escape score. A brighter dot indicates that the mutation on site position of x-axis is more likely to lead to higher immune escape for antibody of y-axis. MAE, mean absolute error; R2, coefficient of determination; PCC, Pearson’s correlation coefficient.
Extended Data Fig. 5 Prediction of antibody binding of Variant-of-Concerns (VOCs).
a–h, predicted escape scores of antibodies for each VOC are shown in the boxplot. For each analysis, antibodies were separated into two groups that can be escaped or not escaped by SARS-CoV-2 variants according to relative literature. The Center line indicates median; box limits indicate upper and lower quartiles; whiskers indicate 1.5x interquartile range; points indicate outliers; P values less than 0.05, 0.01, 0.001, 0.0001 are summarized with one to four asterisks, respectively. The number of non-escape and escape variants for each variant (Alpha: 18, 1 (a); Delta: 12, 1 (b); Epsilon: 10, 1 (c); lota: 8, 2 (d); Beta: 11, 8 (e); Gamma: 12, 7 (f); Omicron_BA1: 2, 11 (g); Omicron_R346K: 2, 9 (h); i, ROC curves of neutralization escape prediction.
Extended Data Fig. 6 UniBind performance on predicting the binding affinity between the SARS-CoV-2 and ACE2 mutations (hACE2 and cross-species).
a and b, Magnified views of AI-predicted 3D structures of wild type/mutant ACE2 in complex with the wildtype SARS-CoV-2. a, ACE2 mutant carrying residue N330Y interfaces with corresponding residues P499, T500 on SARS-CoV-2 RBD. b, ACE2 mutant carrying residue Q42/L42 interfaces with related residues Q498, Y449 on SARS-CoV-2 RBD.
Extended Data Fig. 7 Validation and prediction of variant function.
a, Correlation analysis between reported fitness and affinity-based evolutionary score (evo-score). b, Validation of immune escape prediction on the Omicron sublineage. Yellow dots (left y axis) indicate log transformed fifty-percent inhibitory dilutions (ID50s) of pseudovirus neutralization assay (n=30), curated from Gruell et al. Blue dots (right y axis) indicated log transformed escape scores of four variants against by monoclonal antibodies. Columns shows mean of the data. Error bar shows standard deviation. Differences between variants were tested by two-tailed Student’s t-test. c, AI’s Quantification of the variant’s function during evolutionary process in one COVID-19 patient. Lower panel: variants detected in a COVID-19 patient from 73rd days to 207th day after infected. The variants classified into ‘Found in VOC’ (variant of concern) and ‘Not found in VOC’, denoted with ‘o’ and ‘+’ respectively. Upper panel: The curves of the quantified variant’s function predicted by our UniBind, including the ACE2 affinity, antibody escape, and evo-score, colored with blue, orange, and green, respectively. d, e, Change in affinity and antibody escape scores of predicted SARS-CoV-2 mutations, ranked according to the magnitude of change. Mutations which are found in the immunocompromised patient, for example E484K and E484Q ranked high on our antibody escape prediction (top 0.1% and 0.8%, respectively). In addition, N501Y, which is also found in the immunocompromised patient, was ranked second in our predictions (top 0.1%).
Extended Data Fig. 8 Validation and prediction of variant fitness.
a-c, Heat maps generated by UniBind deep mutational scanning on S-ACE2 binding affinity values (a), antibody escape scores (b), and evo-score values (c). d, A diagram depicting the variants and their mutation load in the sub-lineage of BQ.1.1 (adapted from Nextstrain70). e, f, The top 50 predicted mutations ranked by an immune escape score. The AI model correctly predicted key mutation S494P (orange) which is present in many VOCs such as BQ.1.1.11, BQ.1.1.12, BQ.1.1.13, BQ.1.1.34, DT.1. (created based on nextstrain.org).
Supplementary information
Supplementary Information
Supplementary Tables 1 and 2.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, G., Liu, X., Wang, K. et al. Deep-learning-enabled protein–protein interaction analysis for prediction of SARS-CoV-2 infectivity and variant evolution. Nat Med 29, 2007–2018 (2023). https://doi.org/10.1038/s41591-023-02483-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41591-023-02483-5
This article is cited by
-
RSPSSL: A novel high-fidelity Raman spectral preprocessing scheme to enhance biomedical applications and chemical resolution visualization
Light: Science & Applications (2024)
-
UniBind: a novel artificial intelligence-based prediction model for SARS-CoV-2 infectivity and variant evolution
Signal Transduction and Targeted Therapy (2023)