Co-evolution-based prediction of metal-binding sites in proteomes by machine learning

Cheng, Yao; Wang, Haobo; Xu, Hua; Liu, Yuan; Ma, Bin; Chen, Xuemin; Zeng, Xin; Wang, Xianghe; Wang, Bo; Shiau, Carina; Ovchinnikov, Sergey; Su, Xiao-Dong; Wang, Chu

doi:10.1038/s41589-022-01223-z

Article
Published: 02 January 2023

Co-evolution-based prediction of metal-binding sites in proteomes by machine learning

Yao Cheng^1,2^na1,
Haobo Wang^1,2^na1,
Hua Xu ORCID: orcid.org/0000-0001-9283-080X³^na1,
Yuan Liu ORCID: orcid.org/0000-0002-1156-7673^1,2^na1,
Bin Ma^1,2,
Xuemin Chen^1,2,
Xin Zeng⁴,
Xianghe Wang^1,2,
Bo Wang³,
Carina Shiau⁵,
Sergey Ovchinnikov ORCID: orcid.org/0000-0003-2774-2744⁶,
Xiao-Dong Su ORCID: orcid.org/0000-0001-6948-2317³ &
…
Chu Wang ORCID: orcid.org/0000-0002-6925-1268^1,2,4

Nature Chemical Biology volume 19, pages 548–555 (2023)Cite this article

6399 Accesses
8 Citations
22 Altmetric
Metrics details

Subjects

Abstract

Metal ions have various important biological roles in proteins, including structural maintenance, molecular recognition and catalysis. Previous methods of predicting metal-binding sites in proteomes were based on either sequence or structural motifs. Here we developed a co-evolution-based pipeline named ‘MetalNetʼ to systematically predict metal-binding sites in proteomes. We applied MetalNet to proteomes of four representative prokaryotic species and predicted 4,849 potential metalloproteins, which substantially expands the currently annotated metalloproteomes. We biochemically and structurally validated previously unannotated metal-binding sites in several proteins, including apo-citrate lyase phosphoribosyl-dephospho-CoA transferase citX, an Escherichia coli enzyme lacking structural or sequence homology to any known metalloprotein (Protein Data Bank (PDB) codes: 7DCM and 7DCN). MetalNet also successfully recapitulated all known zinc-binding sites from the human spliceosome complex. The pipeline of MetalNet provides a unique and enabling tool for interrogating the hidden metalloproteome and studying metal biology.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Predicting metal-binding residue pairs from co-evolution by ML.**

**Fig. 2: Assembly of metal-binding residue pairs into a high-order connected co-evolution network by graph theory.**

**Fig. 3: Predictions and analyses of metalloproteins from representative prokaryotic species.**

**Fig. 4: Biochemical and structural characterization of citX as a zinc-binding protein.**

Improving microbial phylogeny with citizen science within a mass-market video game

Article Open access 15 April 2024

Highly accurate protein structure prediction with AlphaFold

Article Open access 15 July 2021

An open source knowledge graph ecosystem for the life sciences

Article Open access 11 April 2024

Data availability

The original protein structure dataset containing 9,846 protein sequences and their MSA can be downloaded from https://doi.org/10.1073/pnas.1702664114. Co-evolved pairs used in model training can be found in Supplementary Dataset. The proteins and related MSA of prokaryotic species can be downloaded from https://gremlin2.bakerlab.org/db/{species}/fasta/. The Metagenome-pfam MSA and structural model can be downloaded from https://gremlin2.bakerlab.org/db/UNI/. The human spliceosome dataset can be found in Supplementary Dataset. PDB codes 6ID0, 6ID1, 6ICZ and 6QW6 were used to construct human spliceosome dataset. We downloaded the information table of protein entities from the PDB server (https://ftp.wwpdb.org/pub/pdb/derived_data/index/entries.idx) to construct unbiased dataset when comparing methods. UniProt (https://ebi10.uniprot.org) profiles (date: 16 August 2021), Gene Ontology database (https://www.ebi.ac.uk/QuickGO/) and Pfam database (http://pfam.xfam.org/) were used in the analysis. The pdbaa database (16 January 2018 release, ftp://ftp.ncbi.nlm.nih.gov/blast/db/pdbaa.tar.gz) is used in BLASTP. The structures of citX reported in this paper have been deposited in PDB with the accession numbers 7DCM (determined by single-wavelength anomalous diffraction) and 7DCN (determined by molecular replacement). Source data are provided with this paper.

Code availability

Our code is available as open source at https://github.com/wangchulab/MetalNet.

References

Gladyshev, V. N. & Zhang, Y. Comparative genomics analysis of the metallomes. Met. Ions Life Sci. 12, 529–580 (2013).
Article PubMed Google Scholar
Waldron, K. J. & Robinson, N. J. How do bacterial cells ensure that metalloproteins get the correct metal? Nat. Rev. Microbiol. 7, 25–35 (2009).
Article CAS PubMed Google Scholar
Yannone, S. M., Hartung, S., Menon, A. L., Adams, M. W. & Tainer, J. A. Metals in biology: defining metalloproteomes. Curr. Opin. Biotechnol. 23, 89–95 (2012).
Article CAS PubMed Google Scholar
Waldron, K. J., Rutherford, J. C., Ford, D. & Robinson, N. J. Metalloproteins and metal sensing. Nature 460, 823–830 (2009).
Article CAS PubMed Google Scholar
Cvetkovic, A. et al. Microbial metalloproteomes are largely uncharacterized. Nature 466, 779–782 (2010).
Article CAS PubMed Google Scholar
Pace, N. J. & Weerapana, E. A competitive chemical-proteomic platform to identify zinc-binding cysteines. ACS Chem. Biol. 9, 258–265 (2014).
Article CAS PubMed Google Scholar
Sevcenco, A. M. et al. Exploring the microbial metalloproteome using MIRAGE. Metallomics 3, 1324–1330 (2011).
Article CAS PubMed Google Scholar
Andreini, C., Banci, L., Bertini, I. & Rosato, A. Counting the zinc-proteins encoded in the human genome. J. Proteome Res. 5, 196–201 (2006).
Article CAS PubMed Google Scholar
Passerini, A., Punta, M., Ceroni, A., Rost, B. & Frasconi, P. Identifying cysteines and histidines in transition‐metal‐binding sites using support vector machines and neural networks. Proteins Struct. Funct. Bioinf. 65, 305–316 (2006).
Article CAS Google Scholar
Passerini, A., Lippi, M. & Frasconi, P. MetalDetector v2.0: predicting the geometry of metal binding sites from protein sequence. Nucleic Acids Res. 39, W288–W292 (2011).
Article CAS PubMed PubMed Central Google Scholar
Haberal, İ. & Oğul, H. Prediction of protein metal binding sites using deep neural networks. Mol. Inf. 38, e1800169 (2019).
Article Google Scholar
Babor, M., Gerzon, S., Raveh, B., Sobolev, V. & Edelman, M. Prediction of transition metal-binding sites from apoprotein structures. Proteins 70, 208–217 (2008).
Article CAS PubMed Google Scholar
Lin, Y. F. et al. MIB: metal ion-binding site prediction and docking server. J. Chem. Inf. Model. 56, 2287–2291 (2016).
Article CAS PubMed Google Scholar
Zhang, C., Freddolino, P. L. & Zhang, Y. COFACTOR: improved protein function prediction by combining structure, sequence and protein-protein interaction information. Nucleic Acids Res. 45, W291–W299 (2017).
Article CAS PubMed PubMed Central Google Scholar
Gobel, U., Sander, C., Schneider, R. & Valencia, A. Correlated mutations and residue contacts in proteins. Proteins 18, 309–317 (1994).
Article CAS PubMed Google Scholar
Shindyalov, I. N., Kolchanov, N. A. & Sander, C. Can three-dimensional contacts in protein structures be predicted by analysis of correlated mutations? Protein Eng. Des. Select. 7, 349–358 (1994).
Article CAS Google Scholar
Martin, L. C., Gloor, G. B., Dunn, S. D. & Wahl, L. M. Using information theory to search for co-evolving residues in proteins. Bioinformatics 21, 4116–4124 (2005).
Article CAS PubMed Google Scholar
Morcos, F. et al. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc. Natl Acad. Sci. USA 108, E1293–E1301 (2011).
Article CAS PubMed PubMed Central Google Scholar
Balakrishnan, S., Kamisetty, H., Carbonell, J. G., Lee, S. I. & Langmead, C. J. Learning generative models for protein fold families. Proteins 79, 1061–1078 (2011).
Article CAS PubMed Google Scholar
Jones, D. T., Buchan, D. W., Cozzetto, D. & Pontil, M. PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics 28, 184–190 (2012).
Article CAS PubMed Google Scholar
Marks, D. S., Hopf, T. A. & Sander, C. Protein structure prediction from sequence variation. Nat. Biotechnol. 30, 1072–1080 (2012).
Article CAS PubMed PubMed Central Google Scholar
Ovchinnikov, S. et al. Protein structure determination using metagenome sequence data. Science 355, 294–298 (2017).
Article CAS PubMed PubMed Central Google Scholar
Xu, J. Distance-based protein folding powered by deep learning. Proc. Natl Acad. Sci. USA 116, 16856–16865 (2019).
Article CAS PubMed PubMed Central Google Scholar
Ovchinnikov, S., Kamisetty, H. & Baker, D. Robust and accurate prediction of residue-residue interactions across protein interfaces using evolutionary information. eLife 3, e02030 (2014).
Article PubMed PubMed Central Google Scholar
Cong, Q., Anishchenko, I., Ovchinnikov, S. & Baker, D. Protein interaction networks revealed by proteome coevolution. Science 365, 185–189 (2019).
Article CAS PubMed PubMed Central Google Scholar
Toth-Petroczy, A. et al. Structured states of disordered proteins from genomic sequences. Cell 167, 158–170 (2016).
Article CAS PubMed PubMed Central Google Scholar
Chakrabarti, S. & Panchenko, A. R. Coevolution in defining the functional specificity. Proteins 75, 231–240 (2009).
Article CAS PubMed PubMed Central Google Scholar
Kuipers, R. K. et al. Correlated mutation analyses on super-family alignments reveal functionally important residues. Proteins 76, 608–616 (2009).
Article CAS PubMed Google Scholar
Chakrabarti, S. & Panchenko, A. R. Structural and functional roles of coevolved sites in proteins. PLoS One 5, e8591 (2010).
Article PubMed PubMed Central Google Scholar
Jeong, C. S. & Kim, D. Structure-based Markov random field model for representing evolutionary constraints on functional sites. BMC Bioinf. 17, 99 (2016).
Article Google Scholar
Wang, G. & Dunbrack, R. L. Jr. PISCES: a protein sequence culling server. Bioinformatics 19, 1589–1591 (2003).
Article CAS PubMed Google Scholar
Anishchenko, I., Ovchinnikov, S., Kamisetty, H. & Baker, D. Origins of coevolution between residues distant in protein 3D structures. Proc. Natl Acad. Sci. USA 114, 9122–9127 (2017).
Article CAS PubMed PubMed Central Google Scholar
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
Article CAS PubMed Google Scholar
Cariss, S. J. L. et al. YieJ (CbrC) mediates CreBC-dependent colicin E2 tolerance in Escherichia coli. J. Bacteriol. 192, 3329–3336 (2010).
Article CAS PubMed PubMed Central Google Scholar
Schneider, K., Dimroth, P. & Bott, M. Biosynthesis of the prosthetic group of citrate lyase. Biochemistry 39, 9438–9450 (2000).
Article CAS PubMed Google Scholar
Will, C. L. & Luhrmann, R. Spliceosome structure and function. Csh Perspect. Biol. 3, a003707 (2011).
CAS Google Scholar
Charenton, C., Wilkinson, M. E. & Nagai, K. Mechanism of 5′ splice site transfer for human spliceosome activation. Science 364, 362–367 (2019).
Article CAS PubMed PubMed Central Google Scholar
Zhang, X. F. et al. Structures of the human spliceosomes before and after release of the ligated exon. Cell Res. 29, 274–285 (2019).
Article CAS PubMed PubMed Central Google Scholar
Zhang, C. X., Zheng, W., Mortuza, S. M., Li, Y. & Zhang, Y. DeepMSA: constructing deep multiple sequence alignment to improve contact prediction and fold-recognition for distant-homology proteins. Bioinformatics 36, 2105–2112 (2020).
Article CAS PubMed Google Scholar
Piazza, I. et al. A map of protein-metabolite interactions reveals principles of chemical communication. Cell 172, 358–372 (2018).
Article CAS PubMed Google Scholar
Zhuang, S., Li, Q., Cai, L., Wang, C. & Lei, X. Chemoproteomic profiling of bile acid interacting proteins. ACS Cent. Sci. 3, 501–509 (2017).
Article CAS PubMed PubMed Central Google Scholar
Horning, B. D. et al. Chemical proteomic profiling of human methyltransferases. J. Am. Chem. Soc. 138, 13335–13343 (2016).
Article CAS PubMed PubMed Central Google Scholar
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Article CAS PubMed PubMed Central Google Scholar
Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021).
Article CAS PubMed PubMed Central Google Scholar
Cock, P. J. A. et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422–1423 (2009).
Article CAS PubMed PubMed Central Google Scholar
Steinegger, M. et al. HH-suite3 for fast remote homology detection and deep protein annotation. BMC Bioinform. 20, 473 (2019).
Article Google Scholar
Varoquaux, G., Vaught, T., & Millman, J. (eds.). Exploring network structure, dynamics, and function using networkX. In Proceedings of the 7th Python in Science Conference 11–15 (SciPy, 2008).
Huang, Y., Niu, B. F., Gao, Y., Fu, L. M. & Li, W. Z. CD-HIT suite: a web server for clustering and comparing biological sequences. Bioinformatics 26, 680–682 (2010).
Article CAS PubMed PubMed Central Google Scholar
Mirdita, M. et al. ColabFold: making protein folding accessible to all. Nat. Methods 19, 679–682 (2022).
Article CAS PubMed PubMed Central Google Scholar
Hulsen, T., de Vlieg, J. & Alkema, W. BioVenn—a web application for the comparison and visualization of biological lists using area-proportional Venn diagrams. BMC Genom. 9, 488 (2008).
Article Google Scholar
Song, Y. F. et al. High-resolution comparative modeling with RosettaCM. Structure 21, 1735–1742 (2013).
Article CAS PubMed Google Scholar
Wang, C., Vernon, R., Lange, O., Tyka, M. & Baker, D. Prediction of structures of zinc-binding proteins through explicit modeling of metal coordination geometry. Protein Sci. 19, 494–506 (2010).
Article CAS PubMed PubMed Central Google Scholar
Sheldrick, G. M. Experimental phasing with SHELXC/D/E: combining chain tracing with density modification. Acta Crystallogr. D Biol. Crystallogr. 66, 479–485 (2010).
Article CAS PubMed PubMed Central Google Scholar
Adams, P. D. et al. PHENIX: building new software for automated crystallographic structure determination. Acta Crystallogr. D Biol. Crystallogr. 58, 1948–1954 (2002).
Article PubMed Google Scholar
Emsley, P. & Cowtan, K. Coot: model-building tools for molecular graphics. Acta Crystallogr. D Biol. Crystallogr. 60, 2126–2132 (2004).
Article PubMed Google Scholar
Abraham, M. J. et al. GROMACS: high-performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 1, 19–25 (2015).
Article Google Scholar
PyMOL. The PyMOL Molecular Graphics System, Version 2.4 (Schrodinger Inc., 2015).
Bussi, G., Donadio, D. & Parrinello, M. Canonical sampling through velocity rescaling. J. Chem. Phys. 126, 014101 (2007).
Article PubMed Google Scholar
Wang, H., Dommert, F. & Holm, C. Optimizing working parameters of the smooth particle mesh Ewald algorithm in terms of accuracy and efficiency. J. Chem. Phys. 133, 034117 (2010).
Article PubMed Google Scholar

Download references

Acknowledgements

We thank H. Tang in Chu Wang’s lab and the Institute of Geographic Sciences and Natural Resources Research, CAS for the help with ICP measurements. We thank National Center for Protein Sciences at Peking University, Beijing for the help with Circular Dichroism measurements, and the staff of the Shanghai Synchrotron Radiation Facility and KEK Photon Factory for assistance with X-ray data collection. Funding: C.W. was supported by the National Natural Science Foundation of China (grants 21925701, 91953109 and 92153301).

Author information

These authors contributed equally: Yao Cheng, Haobo Wang, Hua Xu, Yuan Liu.

Authors and Affiliations

Synthetic and Functional Biomolecules Center, Beijing National Laboratory for Molecular Sciences, Key Laboratory of Bioorganic Chemistry and Molecular Engineering of Ministry of Education, Peking University, Beijing, China
Yao Cheng, Haobo Wang, Yuan Liu, Bin Ma, Xuemin Chen, Xianghe Wang & Chu Wang
Department of Chemical Biology, College of Chemistry and Molecular Engineering, Peking University, Beijing, China
Yao Cheng, Haobo Wang, Yuan Liu, Bin Ma, Xuemin Chen, Xianghe Wang & Chu Wang
State Key Laboratory of Protein and Plant Gene Research, and Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
Hua Xu, Bo Wang & Xiao-Dong Su
Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, China
Xin Zeng & Chu Wang
Cornell University, Ithaca, NY, USA
Carina Shiau
John Harvard Distinguished Science Fellow, Harvard University, Cambridge, MA, USA
Sergey Ovchinnikov

Authors

Yao Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Haobo Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hua Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Bin Ma
View author publications
You can also search for this author in PubMed Google Scholar
Xuemin Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xin Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Xianghe Wang
View author publications
You can also search for this author in PubMed Google Scholar
Bo Wang
View author publications
You can also search for this author in PubMed Google Scholar
Carina Shiau
View author publications
You can also search for this author in PubMed Google Scholar
Sergey Ovchinnikov
View author publications
You can also search for this author in PubMed Google Scholar
Xiao-Dong Su
View author publications
You can also search for this author in PubMed Google Scholar
Chu Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

H.W. and C.W. conceived the project; Y.C., H.W. and Y.L. performed computational analysis with the help of C.S. and S.O.; H.X. purified citX and solved the structure under the guidance of X.S.; B.M., X.C., X.W. and X.Z. contributed to biochemical verification of zinc binding in citX and other proteins; and Y.C, H.W., Y.L. and C.W. analyzed the data and wrote the manuscript with inputs from all authors.

Corresponding authors

Correspondence to Yuan Liu, Xiao-Dong Su or Chu Wang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Chemical Biology thanks Kevin Yang, Rosalin Bonetta Valentino and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Distribution of the metal types of the metalloproteins in the training set.

ZN: zinc; CA: calcium; MG: magnesium; MN: manganese; FE: iron; SF4: [FE4-S4] iron-sulfur clusters; NI: nickel; CU: copper; CO: cobalt; FES: [FE2-S2] iron-sulfur clusters.

Source data

Extended Data Fig. 2 Examples of the coevolved CHED network clusters detected in the PDB data.

Metal-binding residues in these sites have the tendency to coevolve with each other and cluster together in a high-order coevolution network. The PDB ID and type of the metal ion bound are listed below the corresponding coevolved network.

Extended Data Fig. 3 Distribution of the number of coevolved connections (‘node degree’) for each residue among all the coevolved CHED pairs in the training set.

The CHED residues involved in metal binding (blue) have more node degrees on average than those non-metal-binding residues in the coevolved networks (red).

Source data

Extended Data Fig. 4 Composition of metal-chelating CHED sidechains within the coordination sphere around each specific metal ion.

The statistic is calculated based on metalloprotein structures in the benchmark. Figure a and b show the absolute count and normalized percentage, respectively. ZN: zinc; CA: calcium; MG: magnesium; MN: manganese; NI: nickel; FE: iron; CU: copper; SF4: [FE4-S4] iron-sulfur clusters; CO: cobalt; FES: [FE2-S2] iron-sulfur clusters.

Source data

Extended Data Fig. 5 Overall workflow of MetalNet.

Starting with the information from coevolution analysis, MetalNet uses a machine learning (ML)-based classifier that has been trained by a benchmark of known metalloproteins to predict whether an individual CHED coevolved residue pair is metal-binding or not. It then employs a graph-based approach to identify high-order coevolution network clusters formed by these coevolved CHED pairs to generate reliable predictions of metal-binding sites in the query protein. In certain cases when the coevolution network topology can be matched to that of known metal-binding sites, the method can also infer the information on the type of metal bound in the predicted site. The method does not use any sequence (1D) or structural (3D) homology information to make predictions.

Extended Data Fig. 6 Comparation of the performance between MetalNet and MIB.

a, Evaluation of the performance of MetalNet and MIB on a ‘prospective’ metalloprotein dataset that were deposited in PDB after 2016/08. b, Evaluation of the performance of MetalNet and MIB on this prospective metalloprotein dataset after proteins with one single sidechain as the liganding group were removed. In a and b, the calculated precision, recall and F1-scores for MetalNet and MIB were shown on the left in the table format for all metal-binding sites (‘All sites’), Zn-specific sites (‘Zn’), Mg/Ca-specific sites (‘Mg/Ca’) and the remaining sites (‘others’). The number of correct and incorrect predictions made by MetalNet and MIB were shown on the right in the venn diagram format with the ‘prospective’ metalloprotein dataset (‘PDB’) as the reference (the dataset can be found in Supplementary Dataset 3).

Source data

Extended Data Fig. 7 Distribution of the minimal distances between the metal-binding residue pairs predicted by MetalNet.

The blue block shows the distribution of residue-residue pairwise distances for metal-binding residues from known metalloproteins in the PDB database. Black solid lines show the distribution of mean distances in GREMLIN models for the metal-binding coevolved sites predicted by MetalNet. The distributions generally agree well with each other until 4.5 Å, suggesting that a large portion of MetalNet predictions indeed have metal-binding residues in proximity with each other. Since GREMLIN does not take metal-binding into consideration during structural modeling, the disagreement after 4.5 Å may come from either false positive prediction by MetalNet, or some errors in structural modeling by GREMLIN.

Source data

Extended Data Fig. 8 Validation of metal binding in purified Desor_0198 (a), SVEN_5263 (b), CbrC (c) and CitX (d) by ICP-MS.

For a and b, the coevolved metal-binding cluster predicted by MetalNet are shown on the left, SDS-PAGE of the purified wild-type protein in the middle and ICP-MS measurement of metal binding of the purified protein are shown on the right. Both proteins are validated with zinc-binding activity. c, The coevolved metal-binding cluster predicted by MetalNet are shown on the left, SDS-PAGE of the purified wild-type protein, two single mutants (C56S or C182S) and the double mutant (C56S&C182S) in the middle and ICP-MS analysis of zinc binding in the purified wild-type cbrC and mutants on the right. Partial zinc binding was retained in each of the single mutant whereas metal binding activity was completely abolished in the double mutant. In a, b, and c, error bars mean and s.d (n = 3 biologically independent samples). ICP-MS analysis of Fe and Cu binding is measured only once. d, SDS-PAGE of the purified wild-type citX as well as four single mutants of the predicted metal-binding residues by MetalNet (C145S, C148S, C155S and H161S). ICP-MS analysis of citX is shown in Fig. 4d. The experiment was repeated twice independently with similar results.

Source data

Extended Data Fig. 9 Molecular dynamic (MD) simulations of citX and its mutants.

MD simulations were performed to calculate root mean square fluctuation (RMSF) for citX (PDB ID: 7DCN) and its mutants (C145S, C148S, C155S and H161, metal-binding residues predicted by MetalNet) using GROMACS. The results suggested that all mutants show greater conformational fluctuations at the binding site.

Source data

Extended Data Fig. 10 Prediction of metal-binding sites in the human spliceosome.

a, Scheme of predicting metal-binding sites in the human spliceosome using coevolution obtained by deepMSA and MSA transformer. b, Highlight of the zinc-binding sites in the human spliceosome predicted by MetalNet that match well with the experimental structures in PDB. The experimental structures of the corresponding metalloproteins subunits are shown in cartoon on the left and the predicted coevolution networks corresponding to the metal-binding sites are shown on the right. Metal-chelating residues are shown in sticks and zinc ions are shown in spheres.

Supplementary information

Supplementary Information

Supplementary Tables 1–5 and Supplementary Fig. 1.

Reporting Summary

Supplementary Data 1

The list of co-evolved pairs used for training and evaluating of ML model in this study.

Supplementary Data 2

The adjacency list of ‘motifs‘ in co-evolution motif bank.

Supplementary Data 3

The ‘prospective‘ metalloprotein dataset that was deposited in PDB after August 2016. It was used for comparing the performance of MetalNet and MIB in an unbiased manner.

Supplementary Data 4

The predicted metal-binding sites by MetalNet in the prokaryotic species dataset.

Supplementary Data 5

The predicted metal-binding sites in metagenome-Pfam dataset.

Supplementary Data 6

The predicted metal-binding sites in the human spliceosome sequences dataset.

Source data

Source Data Fig. 1

Statistical source data.

Source Data Fig. 3

Statistical source data.

Source Data Fig. 4

Unprocessed gels and statistical source data.

Source Data Extended Data Fig. 1

Statistical source data.

Source Data Extended Data Fig. 3

Statistical source data.

Source Data Extended Data Fig. 4

Statistical source data.

Source Data Extended Data Fig. 6

Statistical source data.

Source Data Extended Data Fig. 7

Statistical source data.

Source Data Extended Data Fig. 8

Unprocessed gels and ICP-MS data.

Source Data Extended Data Fig. 9

MD data.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Cheng, Y., Wang, H., Xu, H. et al. Co-evolution-based prediction of metal-binding sites in proteomes by machine learning. Nat Chem Biol 19, 548–555 (2023). https://doi.org/10.1038/s41589-022-01223-z

Download citation

Received: 11 January 2022
Accepted: 08 November 2022
Published: 02 January 2023
Issue Date: May 2023
DOI: https://doi.org/10.1038/s41589-022-01223-z

This article is cited by

Using protein language models for protein interaction hot spot prediction with limited data
- Karen Sargsyan
- Carmay Lim
BMC Bioinformatics (2024)

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links