SARS-CoV-2 receptor-binding domain deep mutational AlphaFold2 structures

Kilim, Oz; Mentes, Anikó; Pál, Balázs; Csabai, István; Gellért, Ákos

doi:10.1038/s41597-023-02035-z

Download PDF

Data Descriptor
Open access
Published: 14 March 2023

SARS-CoV-2 receptor-binding domain deep mutational AlphaFold2 structures

Scientific Data volume 10, Article number: 134 (2023) Cite this article

2240 Accesses
1 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Leveraging recent advances in computational modeling of proteins with AlphaFold2 (AF2) we provide a complete curated data set of all single mutations from each of the 7 main SARS-CoV-2 lineages spike protein receptor binding domain (RBD) resulting in 3819X7 = 26733 PDB structures. We visualize the generated structures and show that AF2 pLDDT values are correlated with state-of-the-art disorder approximations, implying some internal protein dynamics are also captured by the model. Joint increasing mutational coverage of both structural and phenotype data coupled with advances in machine learning can be leveraged to accelerate virology research, specifically future variant prediction. We hope this data release can offer assistance into further understanding of the local and global mutational landscape of SARS-CoV-2 as well as provide insight into the biological understanding that 3D structure acts as a bridge between protein genotype and phenotype.

Structural bioinformatics analysis of SARS-CoV-2 variants reveals higher hACE2 receptor binding affinity for Omicron B.1.1.529 spike RBD compared to wild type reference

Article Open access 25 August 2022

Flexibility and mobility of SARS-CoV-2-related protein structures

Article Open access 19 February 2021

Identifying and profiling structural similarities between Spike of SARS-CoV-2 and other viral or host proteins with Machaon

Article Open access 19 July 2023

Background & Summary

The receptor binding domain (RBD) of the SARS-CoV-2 spike protein, in its active conformation, is the domain that binds directly the to ACE2 receptor which itself is a protein on the surface of many cell types that acts as a “cellular doorway” for the SARS-CoV-2 virus. Understanding the competitive binding between RBD, ACE2, mono, and polyclonal antibodies is core to assessing the potential evolutionary fitness of a given variant.

Deep mutational scanning (DMS) experiments¹ make gathering biophysical (phenotype) values such as protein expression as well as RBD-ACE2 binding affinity for close mutants in parallel possible, further enabling mapping of the local evolutionary landscape of any antigen. However, the variant combinatorial space hugely expands with the number of mutations of a protein; $1{9}^{x}\left(\begin{array}{c}n\\ x\end{array}\right)$ where x is the mutational distance (x = 15 for Omicron BA1²) and n is the sequence length (201 for the RBD). So there are approximately 4.23*10³⁰ “Omicron-distant” variants from the original Wuhan sequence. DMS experiments produce data orders of magnitude too small to fully cover this number of possible mutants. Omicron for example, with its 15 mutations from the original Wuhan variant is a prime example of a more distant variant that has been shown to escape many antibodies evoked by early vaccines. It would have been hugely valuable to know this ahead of time which motivates the development of in-silico machine learning techniques to predict biophysical values with regard to the stability of proteins, expression extent, and most importantly protein-protein interaction Gibbs-free energies: ΔG.

Classical predictive models with hand-crafted features exist for the genotype to phenotype prediction problem^3,4. However, the recent advanced in Supervised machine learning (ML) has led to models that often outperform classical models in presence of enough training data. This is due to their ability to learn task-specific features. Supervised machine learning as a tool fits the problem description where we have some distribution we want to learn and we require generalization to new unseen data which in our case would be unseen combinatorially distant variants. In Fig. 1 we outline how ML-based genotype to phenotype predictive models may be designed based on different representations of proteins. Proteins manifest as 3D folded forms natively so the 3D structure of proteins strongly dictates their biophysical properties. In order to leverage this prior knowledge in an ML framework we must gather 3D structures of proteins that match measured biophysical data to create pairs (X, Y) for model training where X is the 3D protein or protein-protein complex structure and Y is the corresponding biophysical measurement. Such a predictive model could be summarised as f where Y = f(X) and the loss ℓ(Y_measured, Y_predicted) is minimized with some minimization procedure. Such a model would allow vaccine developers to investigate how new antibodies/mixtures could be potent against future variants as well as allow the scientific and political community to get ahead of the virus evolution and be in a proactive position, not a reactive one that we are in at the time of writing. Bloom et al. have provided a preliminary tool⁵ based on a simple linear model to predict antibody escape (an important assessment of viral threat), however, this may deteriorate for more distant mutants. Predictive profiling of SARS-CoV-2 variants by deep mutational learning⁶ aims to produce a similar result but is limited by its 3D model-free nature. 3D model representations are only worth leveraging for predictive models if they contain additional signal that truly is physically relevant on top of the amino acid sequence loaded with chemical information. This motivated our study.

The ability of AlphaFold2⁷ (AF2) to produce accurate predictions for single mutants is under debate. Variable methodologies and datasets^8,9,10,11,12 are used to make claims that predicted structures correctly resemble their measured counterpart. This is potentially difficult to assess as the effect of a mutation may be small compared to the inherent conformational dynamics and disorder of a given protein. In ref. ⁹ the authors chose specific illustrative examples of selected proteins for which experimental and structural data for both wild-type (WT) and structure-disrupting mutations are available. The authors compare the root mean squared distance (RMSD.) between structure disrupting mutations to the WT both for measured structures and AF2 predicted mutants. AF2 was unable to predict when a point mutation causes defective protein folding as both RMSD and pLDDT values were not concordant. This study however is impossible to perform for large sets of variants where there are no PDB ground truth reference structures so its conclusions are limited in ability to generalize to other proteins. In ref. ¹⁰ the authors find no correlation between ΔpLDDT values and biophysical measurement values of GFP fluorescence in 976 mutations of 90 proteins from the Thermo-Mut database¹³. The authors argue that some correlation between phenotypic values that relate to ΔΔG and ΔpLDDT should be observed to indicate that generated AF2 single mutants possess a physically meaningful structure.

On the other side of the debate, ref. ¹¹ leverage large-scale DMS data of 33 proteins with 117,135 mutations by investigating the correlations between the DMS predicted protein function values with both AF2 and experimentally derived structures. These protein phenotype quantities were calculated with structure-based protein function predictors; FoldX¹⁴, Rosetta¹⁵ and DynaMut2¹⁶. Strong correlations were seen for both experimentally derived structures and AF2-generated structures. It is, however, not possible to know if structure-based protein function predictors actually leverage 3D signal or simply use distilled sequence information to make their predictions. Additionally, in ref. ¹² the authors show local structural change (as quantified with the local distance difference test (LDDT)⁸ metric) is correlated in experimental (PDB) and AF2-predicted pairs. Linking structure to phenotype as a data validation approach they find significant correlations between local structural changes in AF2-predicted structures and three categories of phenotype; fluorescence, folding, and catalysis across multiple experimental data sets.

This debate is still open and general validation of all structures generated by AF2 is out of the scope of this paper. Generalization of the model to new regions of the structural space may be very difficult to globally validate. At the time of writing, the utility of predicted 3D models needs to be evaluated on a case-by-case basis. In this spirit, we aim to validate SARS-CoV-2 RBD deep mutational AF2 structures which exist on a small manifold of this space. Despite this debate, we believe our validation methodology is well supported by the literature. The data set we release should help speed up research in the field of structural virology. This is because accessibility to our curated dataset alleviates resource and time burden for data generation and organization. This data can be used for downstream tasks for example predicting biophysical qualities or antibody escape¹⁷. Because we validate a diverse set of generated mutants, this validation should generalize to other RBD variants, not in our data release that researchers may want to generate themselves with AF2. In this work, we present a curated and validated dataset “SARS-CoV-2 RBD deep mutational AlphaFold2 structures”. Namely, 26733 aligned PDB structures to accelerate SARS-CoV-2-related research.

Methods

Generation of PDB library

We used the Ampere01 machine at the Wigner Research Centre for Physics (256 CPUs - AMD EPYC 7742 64-Core Processor, 8 nVIDIA A100 80GB GPU cards). A 5 TB SSD was used for the AlphaFold2 database access time acceleration. The input FASTA sequences were generated by first downloading the original spike protein (NCBI RefSeq: YP_009724390.1, UniProt ID: P0DTC2¹⁸) sequence and choosing the RBD section amino acids 331–531. From this sequence, we created the 6 other main mutants using the given positions and amino acid changes in Stanford University Coronavirus Antiviral Resistance Database². This data is openly available. This was cross-validated against the main variant sequences¹⁹ and the GitHub repository from the Bloom lab where the complementary phenotypic data is hosted and experimental details of data generation is described¹. The WT + 6 variant sequences were then iteratively mutated to create every possible single mutant. The RBD consists of 201 amino acid residues so the number of single mutants per variant is 201 × 19 = 3819. We will refer to this as a mutant cluster. This represents the positional range 331–531 of the full spike protein. The 7 mutants clusters produced were wuhan, alpha, beta, delta, eta, omicronBA1, and omicronBA2 amounting to 7 × 3819 = 26733 RBD protein FASTA sequence²⁰ and structure files. These outputs contain all metadata files that were generated during AlphaFold2 runs. The running time for one RBD model was between 15 and 20 minutes. 5 structures were produced from each variant FASTA file and the model with the highest overall confidence was chosen for our library resulting in 7.2 GB for the entire library. We optimized the distribution of AlphaFold2 running 40 parallel jobs simultaneously. The selected structures were then aligned to the RBD part of the 6M0J RBD-ACE2 protein structure (PDB ID: 6M0J_2|Chain B[auth E] (15:208)^21,22, UniProt ID: P0DTC2 (334:526)¹⁸) from the protein data bank with the Schrödinger Maestro built-in “structalign”²³ post-processing method to both 6M0J and each cluster to its aligned parent variant as described in the Data Records section. All atoms are used for this alignment, this software uses an optimization method to minimize the RMSD between the two protein structures.

Data Records

The datasets^{24,25,26,27,28,29,30} can be downloaded at: https://figshare.com/projects/SARS-CoV-2_RBD_single_mutant_AlphaFold2_structures/150089 or automatically downloaded and reprocessed with our script.n of our results. Each single mutant cluster will then be found within its variant folder under a folder named “structures”. For each variant, there are 19 randomly selected single mutants given as examples, and do not need to be unzipped. These files can be inspected immediately with any PDB reading software for example³¹. These files are automatically removed when using the./data_prepare.sh script to avoid duplicates in the final unzipped folders that the user can access. The filename of each PDB is provided in the structure:

structures/{VARIANT}/rot-{VARIANT}_RBD_331_531_{reference allele}{residue position in Spike protein}{alternative allele}.pdb. For example: “structures/alpha/rot-Alpha_RBD_331_531_A344C.pdb”.

We provide all generated sequence²⁰. These are the inputs to the AF2 model. FASTA files are in the structure: FASTA/{VARIANT}/{VARIANT}_RBD_331:531_{reference allele}{residue position in Spike protein}{alternative allele}.fasta. For example: “FASTA/alpha/Alpha_RBD_331:531_A344C.fasta”.

We provide a re-aligned version³² of the same data set where firstly each main variant is aligned to the 6M0J^21,22 structure and then all single mutant structures within each cluster were aligned to their respective parent variant structure. This may offer more flexibility for users of the data in exploring the potential complexes with the RBDs. Re-use or re-distribution of the data is compliant with the Attribution 4.0 International (CC BY 4.0) license.

Technical Validation

Visualizing generated data

In Figs. 2, 3 we present aligned AF2-generated RBD variants. These images provide a first insight into the alignment and quality of the AF2 predictions in the region of the structure homologous we are investigating. In Fig. 4 we provide a visualization of how the representations of the entire library are related and are embedded in their respective spaces: 𝔽 and 𝔸 (see Fig. 1). In 𝔽 sequence space we see discrete clusters. Importantly in 𝔸 structural space we still see these clusters meaning signature information of all the structures belonging to one variant linage are somewhat conserved after the AF2 projection to structure space 𝕊 and then to adjacency matrix space 𝔸. In summary, variants’ chemical and geometric information are clustered independently.

Validation of protein disorder

As an investigation into the physical reliability of the AF2 predictions we explored the pLDDT scores generated for the variants^{24,25,26,27,28,29,30}. Recent literature suggests pLDDT scores predicted by the model are correlated with disorder metrics^12,33,34. Using the state-of-the-art disorder calculator IUPred2³⁵ we find that 100-pLDDT∝1/IU pred2 for each cluster of single mutants (see Table 1 and Fig. 5). AF2 structures encoding relevant disorder information is critical specifically for SARS-CoV-2 variants as there is evidence that mutations preferentially emerge at intrinsically disordered protein sites³⁶.

Table 1 |R²| values between 1/IU pred and 100-pLDDT.

Full size table

Validation of 3D alignment

All structures we release are aligned to the RBD part of the 6M0J RBD-ACE2 protein structure^21,22 from the protein data bank with the Schrödinger Maestro built-in “structalign” post-processing method. All RMDS values for these alignments can be downloaded from our figshare page³⁷ All atoms have been involved in the structure alignments. The goal of this is to allow for standardization of the data set. This is beneficial for downstream modeling for example docking with protein partners and antibodies. We validate the quality of our alignment with the rotation agnostic dense adjacency matrix (contact map) representations of each variant A (See Fig. 1 for context).

$${\rm{A}}=\left[\begin{array}{ccccc}{d}_{11} & \ldots & {d}_{1j} & \ldots & {d}_{1n}\\ \vdots & \ddots & \vdots & \ddots & \vdots \\ {d}_{i1} & \ldots & {d}_{ij} & \ldots & {d}_{in}\\ \vdots & \ddots & \vdots & \ddots & \vdots \\ {d}_{n1} & \ldots & {d}_{nj} & \ldots & {d}_{nn}\end{array}\right]$$

(1)

Where ${d}_{ij}=\sqrt{{({x}_{i}-{x}_{j})}^{2}+{({y}_{i}-{y}_{j})}^{2}+{({z}_{i}-{z}_{j})}^{2}}$, using the residue center position: $x,y,z={\sum }_{n=1}^{{n}_{atoms}}\left(X,Y,Z\right)$ and n = 201 for the spike protein RBD. Structures that are more distorted have a larger adjacency matrix distance from their parent adjacency matrix. This distance corresponds only to the amount of distortion of the mutant and as adjacency matrices are independent of alignment. This is because each adjacency matrix value corresponds to an inter-residue-residue distance which does not change on structure rotation. Mutants that are more structurally distorted (have a larger adjacency matrix distance to their parent) can be expected to be more “difficult” to align. We found that these same proteins have a larger structalign RMSD. This validates the structalign alignment as, if the alignment was not good this observed correlation would be washed out. For example, highly distorted structures may not necessarily have the largest structalign RMSD and vice versa. There is a very strong correlation (R = 0.962) between variant RMSD from WT and Structure-based RMSD post-alignment. This should give users of the released data confidence in PDB alignments for further 3D modeling tasks.

Usage Notes

Complementary phenotype measurements for structures

The structures can be compared with phenotypic values measured by the Bloom lab. Full experimental details are also published³⁸. All mutants can be matched by filename to the CSV file entry. These measurements include ACE2 binding and protein expression relative to their given parent WT. Some mutants have nan values in places where no reliable experimental results were possible to publish and may need to be removed or ignored for downstream tasks.

PQR and APBS generation

We have prepared a script PQR-APBS.ipynb which can be used to transform all the structures to PQR³⁹ and then APBS⁴⁰ electrostatics form. The combined charge and structure features may offer complementary signal for downstream learning tasks.

Code availability

All code and instructions needed to reproduce all representations and results in the paper can be found at: https://github.com/csabaiBio/RBD-AlphaFold2-structures-and-phenotypic-information

The GitHub repository contains scripts for:

• https://github.com/csabaiBio/RBD-AlphaFold2-structures-and-phenotypic-information/blob/main/data_usage_scripts/aa_changes_vs_RMSD.ipynb Exploring simple amino acid variation statistics.

• https://github.com/csabaiBio/RBD-AlphaFold2-structures-and-phenotypic-information/blob/main/interface_exploration/interface_importance.ipynb Exploring the importance of the interface with respect to ACE2 binding values.

• https://github.com/csabaiBio/RBD-AlphaFold2-structures-and-phenotypic-information/blob/main/projections/UMAP_all_vars_structs.ipynb Visualizing embeddings.

• https://github.com/csabaiBio/RBD-AlphaFold2-structures-and-phenotypic-information/blob/main/disorder_analysis/iupred_notebook-Analyses_mod.ipynb Investigating how pLDDT (b-factor) output of the AF2 files correlate with state-of-the-art disorder estimations.

References

Starr, T. N. et al. Deep mutational scanning of sars-cov-2 receptor binding domain reveals constraints on folding and ace2 binding. Cell 182, 1295–1310.e20, https://doi.org/10.1016/j.cell.2020.08.012 (2020).
Article CAS PubMed PubMed Central Google Scholar
Tzou, P. L., Tao, K., Pond, S. L. K. & Shafer, R. W. Coronavirus resistance database (cov-rdb): Sars-cov-2 susceptibility to monoclonal antibodies, convalescent plasma, and plasma from vaccinated persons. Plos one 17, e0261045 (2022).
Article CAS PubMed PubMed Central Google Scholar
Vangone, A. & Bonvin, A. M. Contacts-based prediction of binding affinity in protein–protein complexes. elife 4 (2015).
Kastritis, P. L., Rodrigues, J. P., Folkers, G. E., Boelens, R. & Bonvin, A. M. Proteins feel more than they see: fine-tuning of binding affinity by properties of the non-interacting surface. Journal of molecular biology 426, 2632–2652 (2014).
Article CAS PubMed Google Scholar
Greaney, A. J., Starr, T. N. & Bloom, J. D. An antibody-escape estimator for mutations to the sars-cov-2 receptor-binding domain. Virus evolution 8, veac021 (2022).
Article PubMed PubMed Central Google Scholar
Taft, J. M. et al. Deep mutational learning predicts ace2 binding and antibody escape to combinatorial mutations in the sars-cov-2 receptor-binding domain. Cell 185, 4008–4022 (2022).
Article CAS PubMed PubMed Central Google Scholar
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589, https://doi.org/10.1038/s41586-021-03819-2 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Mariani, V., Biasini, M., Barbato, A. & Schwede, T. lddt: a local superposition-free score for comparing protein structures and models using distance difference tests. Bioinformatics 29, 2722–2728 (2013).
Article CAS PubMed PubMed Central Google Scholar
Buel, G. R. & Walters, K. J. Can alphafold2 predict the impact of missense mutations on structure? Nature Structural & Molecular Biology 29, 1–2 (2022).
Article CAS Google Scholar
Pak, M. A. et al. Using alphafold to predict the impact of single mutations on protein stability and function. BioRxiv (2021).
McBride, J. M., Polev, K., Reinharz, V., Grzybowski, B. A. & Tlusty, T. Alphafold2 can predict structural and phenotypic effects of single mutations. arXiv preprint arXiv:2204.06860 (2022).
Akdel, M. et al. A structural biology community assessment of alphafold 2 applications. BioRxiv (2021).
Xavier, J. S. et al. Thermomutdb: a thermodynamic database for missense mutations. Nucleic acids research 49, D475–D479 (2021).
Article CAS PubMed Google Scholar
Schymkowitz, J. et al. The foldx web server: an online force field. Nucleic acids research 33, W382–W388 (2005).
Article CAS PubMed PubMed Central Google Scholar
Leaver-Fay, A. et al. Rosetta3: an object-oriented software suite for the simulation and design of macromolecules. In Methods in enzymology, vol. 487, 545–574 (Elsevier, 2011).
Rodrigues, C. H., Pires, D. E. & Ascher, D. B. Dynamut2: Assessing changes in stability and flexibility upon single and multiple point missense mutations. Protein Science 30, 60–69 (2021).
Article CAS PubMed Google Scholar
Greaney, A. J., Starr, T. N. & Bloom, J. D. An antibody-escape calculator for mutations to the sars-cov-2 receptor-binding domain. BioRxiv (2021).
Yan, R. et al. Structural basis for the recognition of sars-cov-2 by full-length human ace2. Science 367, 1444–1448 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Hodcroft, E. B. Covariants: Sars-cov-2 mutations and variants of interest. (2021).
Mentes, A. Single mutant sequences of wuhan’s (sars-cov-2) rbd). figshare https://doi.org/10.6084/m9.figshare.21311130.v2 (2022).
Lan, J. et al. Structure of the sars-cov-2 spike receptor-binding domain bound to the ace2 receptor. Nature 581, 215–220 (2020).
Article ADS CAS PubMed Google Scholar
Wang, X., Lan, J., Ge, J., Yu, J. & Shan, S. Crystal structure of SARS-CoV-2 spike receptor-binding domain bound with ACE2 (2020).
Schrodinger, N. Y. N. LLC. Maestro.
Mentes, A. Af structures of alpha variant’s (sars-cov-2) rbd) data sets. figshare https://doi.org/10.6084/m9.figshare.21304554.v2 (2022).
Mentes, A. Af structures of beta variant’s (sars-cov-2) rbd) data sets. figshare https://doi.org/10.6084/m9.figshare.21304620.v2 (2022).
Mentes, A. Af structures of eta variant’s (sars-cov-2) rbd) data sets. figshare https://doi.org/10.6084/m9.figshare.21304641.v2 (2022).
Mentes, A. Af structures of delta variant’s (sars-cov-2) rbd) data sets. figshare https://doi.org/10.6084/m9.figshare.21304665.v2 (2022).
Mentes, A. Af structures of wuhan variant’s (sars-cov-2) rbd) data sets. figshare https://doi.org/10.6084/m9.figshare.21304680.v2 (2022).
Mentes, A. Af structures of omicronba1 variant’s (sars-cov-2) rbd) data sets. figshare https://doi.org/10.6084/m9.figshare.21304671.v2 (2022).
Mentes, A. Af structures of omicronba2 variant’s (sars-cov-2) rbd) data sets. figshare https://doi.org/10.6084/m9.figshare.21304674.v2 (2022).
Schrödinger, L. & DeLano, W. Pymol.
Mentes, A. Xray aligned basic variant files. figshare https://doi.org/10.6084/m9.figshare.21581667.v2 (2022).
Wilson, C., Choy, W. & Karttunen, M. Alphafold2: A role for disordered protein prediction? biorxiv 2021. Google Scholar (2021).
Piovesan, D., Monzon, A. M. & Tosatto, S. C. Intrinsic protein disorder, conditional folding and alphafold2. bioRxiv (2022).
Mészáros, B., Erdös, G. & Dosztányi, Z. Iupred2a: context-dependent prediction of protein disorder as a function of redox state and protein binding. Nucleic acids research 46, W329–W337 (2018).
Article PubMed PubMed Central Google Scholar
Quaglia, F. et al. Sars-cov-2 variants preferentially emerge at intrinsically disordered protein sites helping immune evasion. The FEBS Journal (2022).
Mentes, A. Rmsd values of sars-cov-2’s rbds calculated by the “structalign” command. figshare https://doi.org/10.6084/m9.figshare.21939704.v1 (2022).
Starr, T. N. et al. Shifting mutational constraints in the sars-cov-2 receptor-binding domain during viral evolution. BioRxiv (2022).
Dolinsky, T. J., Nielsen, J. E., McCammon, J. A. & Baker, N. A. Pdb2pqr: an automated pipeline for the setup of poisson–boltzmann electrostatics calculations. Nucleic acids research 32, W665–W667 (2004).
Article CAS PubMed PubMed Central Google Scholar
Jurrus, E. et al. Improvements to the apbs biomolecular solvation software suite. Protein Science 27, 112–128 (2018).
Article CAS PubMed Google Scholar
Lin, Z. et al. Language models of protein sequences at the scale of evolution enable accurate structure prediction. bioRxiv (2022).
Gainza, P. et al. Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nature Methods 17, 184–192 (2020).
Article CAS PubMed Google Scholar
Renaud, N. et al. Deeprank: a deep learning framework for data mining 3d protein-protein interfaces. Nature communications 12, 1–8 (2021).
Article ADS Google Scholar
Zhang, N. et al. Mutabind2: predicting the impacts of single and multiple mutations on protein-protein interactions. Iscience 23, 100939 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
McInnes, L., Healy, J. & Melville, J. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018).

Download references

Acknowledgements

This work was supported by the European Union’s Horizon 2020 research and innovation program under grant agreement No. 874735 (VEO) and by the National Research, Development, and Innovation Office of Hungary within the framework of the MILAB Artificial Intelligence National Laboratory. Further support was provided by the Stipendium Hungaricum Program under the Tempus Public Foundation. AlphaFold2 structure predictions were run at Wigner Scientific Computational Laboratory GPU Lab.

Funding

Open access funding provided by Eötvös Loránd University.

Author information

Authors and Affiliations

Department of Physics of Complex Systems, Eötvös Loránd University, Budapest, Hungary
Oz Kilim, Anikó Mentes, Balázs Pál, István Csabai & Ákos Gellért
Wigner Research Centre for Physics, 1121, Budapest, Hungary
Balázs Pál
Veterinary Medical Research Institute, Eötvös Loránd Research Network, 1581, Budapest, P.O. box 18, Hungary
Ákos Gellért

Authors

Oz Kilim
View author publications
You can also search for this author in PubMed Google Scholar
Anikó Mentes
View author publications
You can also search for this author in PubMed Google Scholar
Balázs Pál
View author publications
You can also search for this author in PubMed Google Scholar
István Csabai
View author publications
You can also search for this author in PubMed Google Scholar
Ákos Gellért
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.G.,A.M.,I.C.,O.K. conceived the experiment(s), A.M. and O.K. conducted the experiment(s), A.M. and O.K.,A.G. and I.C. analyzed the results. B.P., A.K. and A.G. generated the data. A.M. Released the data. All authors contributed equally to this work. All authors reviewed the manuscript.

Corresponding author

Correspondence to Ákos Gellért.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Kilim, O., Mentes, A., Pál, B. et al. SARS-CoV-2 receptor-binding domain deep mutational AlphaFold2 structures. Sci Data 10, 134 (2023). https://doi.org/10.1038/s41597-023-02035-z

Download citation

Received: 01 December 2022
Accepted: 20 February 2023
Published: 14 March 2023
DOI: https://doi.org/10.1038/s41597-023-02035-z

This article is cited by

SARS-CoV-2 proteins structural studies using synchrotron radiation
- Maksim Kosenko
- Galina Onkhonova
- Alexander Ryzhikov
Biophysical Reviews (2023)