Abstract
Although challenging, the accurate and rapid prediction of nanoscale interactions has broad applications for numerous biological processes and material properties. While several models have been developed to predict the interaction of specific biological components, they use system-specific information that hinders their application to more general materials. Here we present NeCLAS, a general and efficient machine learning pipeline that predicts the location of nanoscale interactions, providing human-intelligible predictions. NeCLAS outperforms current nanoscale prediction models for generic nanoparticles up to 10–20 nm, reproducing interactions for biological and non-biological systems. Two aspects contribute to these results: a low-dimensional representation of nanoparticles and molecules (to reduce the effect of data uncertainty), and environmental features (to encode the physicochemical neighborhood at multiple scales). This framework has several applications, from basic research to rapid prototyping and design in nanobiotechnology.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 /Â 30Â days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$99.00 per year
only $8.25 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
Additional data are available at Deep Blue Data, an open and permanent data repository maintained by the University of Michigan56. This repository contains all raw files too large to include with the paper, including atomic coordinate files, simulations inputs and outputs, and individual pairwise predictions. The source data for Figs. 1–5 are available with this paper.
Code availability
The code used in this work and the relative documentation is available on Code Ocean57. Public releases of the code can be found at https://gitlab.eecs.umich.edu/violigroup/ml/neclas/-/releases/.
References
Ghosh, G. & Panicker, L. Protein–nanoparticle interactions and a new insight. Soft Matter 17, 3855–3875 (2021).
Russ, K. A. et al. C60 fullerene localization and membrane interactions in RAW 264.7 immortalized mouse macrophages. Nanoscale 8, 4134–4144 (2016).
Liu, C. et al. Predicting the time of entry of nanoparticles in lipid membranes. ACS Nano 13, 10221–10232 (2019).
Pawson, T. & Scott, J. D. Signaling through scaffold, anchoring, and adaptor proteins. Science 278, 2075–2080 (1997).
Holzinger, M., Le Goff, A. & Cosnier, S. Nanomaterials for biosensing applications: a review. Front. Chem. 2, 63–73 (2014).
Cha, S.-H. et al. Shape-dependent biomimetic inhibition of enzyme by nanoparticles and their antibacterial activity. ACS Nano 9, 9097–9105 (2015).
Adcock, S. A. & McCammon, J. A. Molecular dynamics: survey of methods for simulating the activity of proteins. Chem. Rev. 106, 1589–1615 (2006).
Yan, Y., Tao, H., He, J. & Huang, S.-Y. The HDOCK server for integrated protein–protein docking. Nat. Protoc. 15, 1829–1852 (2020).
Lim, S. et al. A review on compound–protein interaction prediction methods: data, format, representation and model. Comput. Struct. Biotechnol. J. 19, 1541–1556 (2021).
Krivák, R. & Hoksza, D. P2Rank: machine learning based tool for rapid and accurate prediction of ligand binding sites from protein structure. J. Cheminform. 10, 39 (2018).
Gainza, P. et al. Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nat. Methods 17, 184–192 (2020).
Sanchez-Garcia, R., Sorzano, C., Carazo, J. M. & Segura, J. BIPSPI: a method for the prediction of partner-specific protein-protein interfaces. Bioinformatics 35, 470–477 (2019).
Dai, B. & Bailey-Kellogg, C. Protein interaction interface region prediction by geometric deep learning. Bioinformatics 37, 2580–2588 (2021).
Minhas, F. u. A. A., Geiss, B. J. & Ben-Hur, A. PAIRpred: partner-specific prediction of interacting residues from sequence and structure. Proteins 82, 1142–1155 (2014).
Fout, A., Byrd, J., Shariat, B. & Ben-Hur, A. Protein interface prediction using graph convolutional networks. In Advances in Neural Information Processing Systems, Vol. 30 (Eds Guyon, I. et al.) (Curran Associates, Inc. 2017).
Vreven, T. et al. Updates to the integrated protein-protein interaction benchmarks: Docking Benchmark version 5 and Affinity Benchmark version 2. J. Mol. Biol. 427, 3031–3041 (2015).
Monopoli, M. P., Åberg, C., Salvati, A. & Dawson, K. A. Biomolecular coronas provide the biological identity of nanosized materials. Nat. Nanotechnol. 7, 779–786 (2012).
Findlay, M. R., Freitas, D. N., Mobed-Miremadi, M. & Wheeler, K. E. Machine learning provides predictive analysis into silver nanoparticle protein corona formation from physicochemical properties. Environ. Sci. Nano 5, 64–71 (2018).
Ouassil, N., Pinals, R. L., Bonis-O’Donnell, J. T. D., Wang, J. W. & Landry, M. P. Supervised learning model predicts protein adsorption to carbon nanotubes. Sci. Adv. 8, eabm0898 (2022).
Alex, J. M. et al. Calixarene-mediated assembly of a small antifungal protein. IUCrJ 6, 238–247 (2019).
Clark, J. J., Orban, Z. J. & Carlson, H. A. Predicting binding sites from unbound versus bound protein structures. Sci. Rep. 10, 15856 (2020).
Costanzo, L. D. & Geremia, S. Atomic details of carbon-based nanomolecules interacting with proteins. Molecules 25, 3555 (2020).
Cha, M. et al. Unifying structural descriptors for biological and bioinspired nanoscale complexes. Nat. Comput. Sci. 2, 243–252 (2022).
Porollo, A. & Meller, J. Prediction-based fingerprints of protein-protein interactions. Proteins 66, 630–645 (2006).
Yang, J., Roy, A. & Zhang, Y. Protein–ligand binding site recognition using complementary binding-specific substructure comparison and sequence profile alignment. Bioinformatics 29, 2588–2595 (2013).
Jiménez, J., Doerr, S., MartÃnez-Rosell, G., Rose, A. S. & De Fabritiis, G. DeepSite: protein-binding site predictor using 3D-convolutional neural networks. Bioinformatics 33, 3036–3042 (2017).
Mylonas, S. K., Axenopoulos, A. & Daras, P. DeepSurf: a surface-based deep learning approach for the prediction of ligand binding sites on proteins. Bioinformatics 37, 1681–1690 (2021).
Le Guilloux, V., Schmidtke, P. & Tuffery, P. Fpocket: an open source platform for ligand pocket detection. BMC Bioinformatics 10, 168 (2009).
Andreeva, A., Kulesha, E., Gough, J. & Murzin, A. G. The SCOP database in 2020: expanded classification of representative family and superfamily domains of known protein structures. Nucleic Acids Res. 48, D376–D382 (2019).
Bier, D. et al. Molecular tweezers modulate 14-3-3 protein–protein interactions. Nat. Chem. 5, 234–239 (2013).
Pintar, A., Carugo, O. & Pongor, S. CX, an algorithm that identifies protruding atoms in proteins. Bioinformatics 18, 980–984 (2002).
Stanton, D. T. & Jurs, P. C. Development and use of charged partial surface area structural descriptors in computer-assisted quantitative structure–property relationship studies. Anal. Chem. 62, 2323–2329 (1990).
Stanton, D. T., Egolf, L. M., Jurs, P. C. & Hicks, M. G. Computer-assisted prediction of normal boiling points of pyrans and pyrroles. J. Chem. Inf. Comput. Sci. 32, 306–316 (1992).
Wang, Y. et al. Anti-biofilm activity of graphene quantum dots via self-assembly with bacterial amyloid proteins. ACS Nano 13, 4278–4289 (2019).
Elvati, P., Baumeister, E. & Violi, A. Graphene quantum dots: effect of size, composition and curvature on their assembly. RSC Adv. 29, 17704–17710 (2017).
Suzuki, N. et al. Chiral graphene quantum dots. ACS Nano 10, 1744–1755 (2016).
Noid, W. Gea. The multiscale coarse-graining method. I. A rigorous bridge between atomistic and coarse-grained models. J. Chem. Phys. 128, 244114 (2008).
Izvekov, S. & Voth, G. A. A multiscale coarse-graining method for biomolecular systems. J. Phys. Chem. B 109, 2469–2473 (2005).
Baranwal, M. et al. Struct2Graph: a graph attention network for structure based predictions of protein–protein interactions. BMC Bioinformatics 23, 370 (2022).
Deguchi, S., Alargova, R. G. & Tsujii, K. Stable dispersions of fullerenes, C60 and C70, in water. Preparation and characterization. Langmuir 17, 6013–6017 (2001).
Kim, K.-H. et al. Protein-directed self-assembly of a fullerene crystal. Nat. Commun. 7, 11429 (2016).
Zaheer, M. et al. Deep sets. In Advances in Neural Information Processing Systems, Vol. 30 (Eds Guyon, I. et al.) (Curran Associates, Inc. 2017).
Martinetz, T., Berkovich, S. & Schulten, K. ‘Neural-gas’ network for vector quantization and its application to time-series prediction. IEEE Trans. Neural Netw. 4, 558–569 (1993).
Sanner, M. F., Olson, A. J. & Spehner, J.-C. Reduced surface: an efficient way to compute molecular surfaces. Biopolymers 38, 305–320 (1996).
Kawabata, T. Detection of multiscale pockets on protein surfaces using mathematical morphology. Proteins 78, 1195–1211 (2010).
Todeschini, R. & Gramatica, P. The WHIM theory: new 3D molecular descriptors for QSAR in environmental modelling. SAR QSAR Environ. Res. 7, 89–115 (1997).
Dolinsky, T. J. et al. PDB2PQR: expanding and upgrading automated preparation of biomolecular structures for molecular simulations. Nucleic Acids Res. 35, W522–W525 (2007).
Hornak, V. et al. Comparison of multiple AMBER force fields and development of improved protein backbone parameters. Proteins 65, 712–725 (2006).
Gasteiger, J. & Marsili, M. A new model for calculating atomic charges in molecules. Tetrahedron Lett. 19, 3181–3184 (1978).
Behler, J. Atom-centered symmetry functions for constructing high-dimensional neural network potentials. J. Chem. Phys. 134, 074106 (2011).
Gastegger, M., Schwiedrzik, L., Bittermann, M., Berzsenyi, F. & Marquetanda, P. WACSF—weighted atom-centered symmetry functions as descriptors in machine learning potentials. J. Chem. Phys. 148, 241709 (2018).
Berman, H., Henrick, K. & Nakamura, H. Announcing the worldwide Protein Data Bank. Nat. Struct. Mol. Biol. 10, 980–980 (2003).
Halgren, T. A. Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94. J. Comput. Chem. 17, 490–519 (1996).
Abadi, M. et al. TensorFlow: large-scale machine learning on heterogeneous systems. TensorFlow https://www.tensorflow.org/ (2015).
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980 (2014).
Saldinger, J., Raymond, M., Elvati, P. & Violi, A. Supporting data: domain-agnostic predictions of nanoscale interactions in proteins and nanoparticles. University of Michigan–Deep Blue Data https://doi.org/10.7302/58q6-0q88 (2023).
Saldinger, J., Raymond, M., Elvati, P. & Violi, A. Domain-agnostic predictions of nanoscale interactions in proteins and nanoparticles. https://codeocean.com/capsule/8157811/tree. Code Ocean https://doi.org/10.24433/CO.8157811.v1 (2023).
Acknowledgements
The work was supported by the BlueSky Initiative, funded by The University of Michigan College of Engineering (principal investigator A.V.), the Army Research Office MURI (grant no. W911NF-18-1-0240) (A.V.), and the National Science Foundation Graduate Research Fellowship under grant no. 1256260 (J.C.S.). We thank C. Scott for insightful feedback and discussions on ML and C. Luyet for the help with the all-atom simulation of 6C-g3OH. We acknowledge Advanced Research Computing, a division of Information and Technology Services at the University of Michigan, for computational resources and services provided for the research.
Author information
Authors and Affiliations
Contributions
A.V. and P.E. conceived and supervised the project. J.C.S. and P.E. conceived chemical features and representations. M.R. and J.C.S. designed, trained and tested the machine learning models. J.C.S. designed experiments and created the database. P.E. designed and ran the MD simulations. All authors read, revised and approved the paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Computational Science thanks the anonymous reviewers for their contribution to the peer review of this work. Primary Handling Editors: Ananya Rastogi and Fernando Chirigati, in collaboration with the Nature Computational Science team.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Figs. 1–11, Notes, Discussion, Tables 1–5 and equations 1–7.
Source data
Source Data Fig. 1
Fig1_c_source.csv: first two principal components and bead type, Fig1_def_source.csv: RMSD values for protein–protein and protein–nanoparticle datasets.
Source Data Fig. 2
Fig2_a_source.csv: per-complex AUC values for protein–nanoparticle datasets, Fig2_b_source: per-complex AUC for protein–protein dataset with leave-one-out cross-validation, Fig2_c_source: per-complex AUC for protein–protein dataset on Docking Benchmark Dataset split.
Source Data Fig. 3
Fig3_a_source.csv: mean residue predictions, Fig3_b_source.csv: residue predictions and statistics, Fig3_c_source.csv: feature values.
Source Data Fig. 4
Fig4_b_source.csv: residue interaction predictions and ground truth, Fig4_c_source.csv: interaction predictions.
Source Data Fig. 5
Fig5a_d_source.csv: interaction potentials for internal and external beads of g3OH and g3CHO.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Saldinger, J.C., Raymond, M., Elvati, P. et al. Domain-agnostic predictions of nanoscale interactions in proteins and nanoparticles. Nat Comput Sci 3, 393–402 (2023). https://doi.org/10.1038/s43588-023-00438-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s43588-023-00438-x