The case for post-predictional modifications in the AlphaFold Protein Structure Database

Bagdonas, Haroldas; Fogarty, Carl A.; Fadda, Elisa; Agirre, Jon

doi:10.1038/s41594-021-00680-9

Download PDF

Correspondence
Published: 29 October 2021

The case for post-predictional modifications in the AlphaFold Protein Structure Database

Nature Structural & Molecular Biology volume 28, pages 869–870 (2021)Cite this article

17k Accesses
45 Citations
141 Altmetric
Metrics details

Subjects

To the editor — AlphaFold2 has arrived to change workflows in structural biology, for good. However, the algorithm does not account for essential modifications that affect protein structure and function, which gives us only part of the picture. Here we discuss how this omission can be addressed in a relatively straightforward manner, which leads to a complete structural prediction of complex biomolecular systems.

The recent release of the AlphaFold Protein Structure Database¹ by DeepMind and EMBL-EBI marks a major breakthrough in structural biology, as it makes available to the scientific community worldwide highly accurate structural predictions for 20,000 proteins from humans and proteins from 20 other biologically relevant organisms that include Escherichia coli. Like many scientists that work on macromolecular structure, we are genuinely excited about this development, yet we feel that there is a non-negligible potential for misinterpretation of its content in its current form. In particular, the protein-only predictions in the AlphaFold database means that cofactors and, most importantly, co- and post-translational modifications are understandably — owing to the scope of the technique — excluded. Among the most relevant co- and post-translational modifications is protein glycosylation — relevant and very visible, as recent studies of the dynamics of a fully glycosylated SARS-CoV-2 spike protein illustrate^2,3. Indeed, between 50% and 70% of those 20,000 predicted human proteins are believed to be glycosylated⁴, but none of this is yet visibly highlighted on the database. Detailed information on the likelihood of modifications is readily available through AlphaFolds’s links to Uniprot (https://www.uniprot.org), and thus we strongly encourage the users of this fantastic new resource to check the information available on Uniprot before downloading a model.

Within this framework, we believe that the absence of cofactors and of co- or post-translational modifications in the models in the AlphaFold Protein Structure Database might be remediated through the use of sequence and structure-based comparative studies. Indeed, in the specific case of glycosylation, the algorithms that are implemented by DeepMind have digested inter-residue distances from the Protein Data Bank (PDB)⁵, where glycosylated proteins often exhibit either full or partial glycan structures; therefore, the space where unmodeled modifications, such as protein glycosylation, should have appeared is somehow preserved in AlphaFold models, which allows for these structural features to be directly grafted onto a model. To demonstrate the potential of this approach, we have developed proof-of-concept functionality that grafts protein glycosylation from a library of structurally equilibrated glycan blocks, obtained from molecular dynamics⁶, onto an AlphaFold model. This task has been automated and integrated into the new Python interface of the carbohydrate-specific Privateer software⁷ and is available to all on its GitHub repository (https://github.com/glycojones/privateer.git). Figure 1 shows AlphaFold model P29016 (depicted in magenta) of a human T cell surface glycoprotein Cd1b, superposed onto the protein’s crystal structure PDB 5WL1. The latter was expressed in an insect cell line and it shows a characteristic double core-fucosylation of the N-glycans, which were omitted in Fig. 1 for clarity. The N-glycan our tool grafted onto the AlphaFold model is not just compatible with the available space, but it shows a high complementarity to the protein surface, where the Man6 core is involved with Trp 23 in a CH-π interaction⁸, as seen in the crystal structure.

**Fig. 1: Grafting an N-glycan onto an AlphaFold model.**

We would like to emphasize that this approach may also be useful to complete the AlphaFold models in the database with other types of modifications. For example, the AlphaFold model P6887, a hemoglobin subunit beta, contains a heme binding site with just enough space for a heme cofactor. Certain structure completions will only be feasible via automated comparative analyses against available structural information — for example, co-translational modifications such as myristoylation⁹, or O-GlcNAcylation¹⁰ — while others such as N-glycosylation or tryptophan mannosylation, which rely on consensus sequences, will be more amenable to prediction. As comparative studies would have to rely on experimental structural information, positional uncertainty (for example, a pLDDT-like score¹¹) may be estimated by comparing the placed coordinates to a superposition of the available structural information. However, in the particular case of protein glycosylation, we see more of a compositional problem; indeed, the biggest challenge would be to get a good estimation of what glycoform is linked to each sequon. Experimental structures offer only partial information owing to limiting factors such as mobility and micro-heterogeneity¹², so other sources of knowledge (for example, glycomics and molecular dynamics simulations) ought to be used, especially when attempting to model full-length glycans, which is something we are sure the glycobiology community will appreciate. We are expanding the Privateer software to address these cases, by harnessing the rich information available in glycomics databases¹³.

To conclude, we think that these early results are highly encouraging to serve as a rallying point for the developers’ community to complete and enrich the predicted protein models with likely modifications, to bring them to their fullest potential and to correctly inform the next generation of structural biology studies.

References

Tunyasuvunakool, K. et al. Nature 596, 590–596 (2021).
Article CAS Google Scholar
Casalino, L. et al. ACS Cent. Sci. 6, 1722–1734 (2020).
Article CAS Google Scholar
Turoňová, B. et al. Science 370, 203–208 (2020).
Article Google Scholar
An, H. J., Froehlich, J. W. & Lebrilla, C. B. Curr. Opin. Chem. Biol. 13, 421–426 (2009).
Article CAS Google Scholar
Berman, H., Henrick, K. & Nakamura, H. Nat. Struct. Biol. 10, 980 (2003).
Article CAS Google Scholar
Fogarty, C. A. & Fadda, E. J. Phys. Chem. B 125, 2607–2616 (2021).
Article CAS Google Scholar
Agirre, J. et al. Nat. Struct. Mol. Biol. 22, 833–834 (2015).
Article CAS Google Scholar
Hudson, K. L. et al. J. Am. Chem. Soc. 137, 15152–15160 (2015).
Article CAS Google Scholar
Udenwobele, D. I. et al. Front. Immunol. 8, 751 (2017).
Article Google Scholar
Zhu, Y. et al. Chem. Biol. 11, 319–325 (2015).
CAS Google Scholar
Jumper, J. et al. Nature 596, 583–589 (2021).
Article CAS Google Scholar
Atanasova, M., Bagdonas, H. & Agirre, J. Curr. Opin. Struct. Biol. 62, 70–78 (2020).
Article CAS Google Scholar
Bagdonas, H., Ungar, D. & Agirre, J. Beilstein J. Org. Chem. 16, 2523–2533 (2020).
Article CAS Google Scholar

Download references

Acknowledgements

H.B. is funded by The Royal Society grant RGF/R1/181006. J.A. is the Royal Society Olga Kennard Research Fellow award ref. UF160039. C.A.F. is funded by the Irish Research Council (IRC) Government of Ireland Postgraduate Scholarship Programme. Data and methods are available at https://doi.org/10.5281/zenodo.5290624

Author information

Authors and Affiliations

York Structural Biology Laboratory, Department of Chemistry, University of York, York, UK
Haroldas Bagdonas & Jon Agirre
Department of Chemistry and Hamilton Institute, Maynooth University, Maynooth, Ireland
Carl A. Fogarty & Elisa Fadda

Authors

Haroldas Bagdonas
View author publications
You can also search for this author in PubMed Google Scholar
Carl A. Fogarty
View author publications
You can also search for this author in PubMed Google Scholar
Elisa Fadda
View author publications
You can also search for this author in PubMed Google Scholar
Jon Agirre
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Elisa Fadda or Jon Agirre.

Ethics declarations

Competing interests

The authors declare no competing interests.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bagdonas, H., Fogarty, C.A., Fadda, E. et al. The case for post-predictional modifications in the AlphaFold Protein Structure Database. Nat Struct Mol Biol 28, 869–870 (2021). https://doi.org/10.1038/s41594-021-00680-9

Download citation

Published: 29 October 2021
Issue Date: November 2021
DOI: https://doi.org/10.1038/s41594-021-00680-9

This article is cited by

Genomic insights into Yak (Bos grunniens) adaptations for nutrient assimilation in high-altitudes
- Hafiz Ishfaq Ahmad
- Sammina Mahmood
- Zhengtian Li
Scientific Reports (2024)
From understanding diseases to drug design: can artificial intelligence bridge the gap?
- Anju Choorakottayil Pushkaran
- Alya A. Arabi
Artificial Intelligence Review (2024)
AlphaFill: enriching AlphaFold models with ligands and cofactors
- Maarten L. Hekkelman
- Ida de Vries
- Anastassis Perrakis
Nature Methods (2023)
Accurate prediction by AlphaFold2 for ligand binding in a reductive dehalogenase and implications for PFAS (per- and polyfluoroalkyl substance) biodegradation
- Hao-Bo Guo
- Vanessa A. Varaljay
- Rajiv Berry
Scientific Reports (2023)
A database of calculated solution parameters for the AlphaFold predicted protein structures
- Emre Brookes
- Mattia Rocco
Scientific Reports (2022)

The case for post-predictional modifications in the AlphaFold Protein Structure Database

Subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Competing interests

Rights and permissions

About this article

Cite this article

This article is cited by

Genomic insights into Yak (Bos grunniens) adaptations for nutrient assimilation in high-altitudes

From understanding diseases to drug design: can artificial intelligence bridge the gap?

AlphaFill: enriching AlphaFold models with ligands and cofactors

Accurate prediction by AlphaFold2 for ligand binding in a reductive dehalogenase and implications for PFAS (per- and polyfluoroalkyl substance) biodegradation

A database of calculated solution parameters for the AlphaFold predicted protein structures

Search

Quick links

Subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Competing interests

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Genomic insights into Yak (Bos grunniens) adaptations for nutrient assimilation in high-altitudes

From understanding diseases to drug design: can artificial intelligence bridge the gap?

AlphaFill: enriching AlphaFold models with ligands and cofactors

Accurate prediction by AlphaFold2 for ligand binding in a reductive dehalogenase and implications for PFAS (per- and polyfluoroalkyl substance) biodegradation

A database of calculated solution parameters for the AlphaFold predicted protein structures

Search

Quick links