Improved fragment sampling for ab initio protein structure prediction using deep neural networks

Wang, Tong; Qiao, Yanhua; Ding, Wenze; Mao, Wenzhi; Zhou, Yaoqi; Gong, Haipeng

doi:10.1038/s42256-019-0075-7

Article
Published: 09 August 2019

Improved fragment sampling for ab initio protein structure prediction using deep neural networks

Tong Wang^1,2,
Yanhua Qiao^1,2,
Wenze Ding^1,2,
Wenzhi Mao^1,2,
Yaoqi Zhou ORCID: orcid.org/0000-0002-9958-5699³ &
…
Haipeng Gong ORCID: orcid.org/0000-0002-5532-1640^1,2

Nature Machine Intelligence volume 1, pages 347–355 (2019)Cite this article

1795 Accesses
24 Citations
3 Altmetric
Metrics details

Subjects

Abstract

A typical approach to predicting unknown native structures of proteins is to assemble the amino acid residues (fragments) extracted from known structures. The quality of these extracted fragments, which are used to build protein-specific fragment libraries, can determine the success or failure of sampling near-native conformations. Here we show how a high-quality fragment library can be built using deep contextual learning techniques. Our algorithm, called DeepFragLib, employs bidirectional long short-term-memory recurrent neural networks with knowledge distillation for initial fragment classification, followed by an aggregated residual transformation network with cyclically dilated convolution for detecting near-native fragments. DeepFragLib improves the position-averaged proportion of near-native fragments by 12.2% over existing methods and, consequently, produces better near-native structures for 72.0% of the free-modelling domain targets tested when integrated with Rosetta. DeepFragLib is fully parallelized and available for use in conjunction with structure prediction programs.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: The overall flowchart of DeepFragLib.**

**Fig. 2: Quality assessment of fragment libraries.**

**Fig. 3: Quality assessment in secondary structure classes.**

**Fig. 4: Evaluation of the top1 models sampled by Rosetta simulations.**

**Fig. 5: Case study of the distribution of TM-Scores.**

Single-sequence protein structure prediction using a language model and deep learning

Article 03 October 2022

Deeper Profiles and Cascaded Recurrent and Convolutional Neural Networks for state-of-the-art Protein Secondary Structure Prediction

Article Open access 26 August 2019

The trRosetta server for fast and accurate protein structure prediction

Article 10 November 2021

Data availability

The full template fragment database HR956 used for fragment library construction and all four CASP datasets used for the quality evaluation of fragment libraries are available on Code Ocean (https://doi.org/10.24433/CO.3579011.v1)⁴⁹.

Code availability

All source codes and models of DeepFragLib are publicly available through a Code Ocean compute capsule (https://doi.org/10.24433/CO.3579011.v1)⁴⁹ and on GitHub (https://github.com/ElwynWang/DeepFragLib). We have also provided an online server for DeepFragLib at http://structpred.life.tsinghua.edu.cn/DeepFragLib.html.

References

Bradley, P., Misura, K. M. S. & Baker, D. Toward high-resolution de novo structure prediction for small proteins. Science 309, 1868–1871 (2005).
Article Google Scholar
Dill, K. A. & MacCallum, J. L. The protein-folding problem 50 years on. Science 338, 1042–1046 (2012).
Article Google Scholar
Rigden, D. J. From Protein Structure To Function With Bioinformatics Ch. 1. (Springer, 2017).
Soding, J. Big-data approaches to protein structure prediction. Science 355, 248–249 (2017).
Article Google Scholar
Kim, D. E., Blum, B., Bradley, P. & Baker, D. Sampling bottlenecks in de novo protein structure prediction. J. Mol. Biol. 393, 249–260 (2009).
Article Google Scholar
Jothi, A. Principles, challenges and advances in ab initio protein structure prediction. Protein Peptide Lett. 19, 1194–1204 (2012).
Article Google Scholar
Wang, T., Yang, Y., Zhou, Y. & Gong, H. LRFragLib: an effective algorithm to identify fragments for de novo protein structure prediction. Bioinformatics 33, 677–684 (2017).
Google Scholar
Baeten, L. et al. Reconstruction of protein backbones from the BriX collection of canonical protein fragments. PLoS Comput. Biol. 4, e1000083 (2008).
Article MathSciNet Google Scholar
Xu, J. Distance-based protein folding powered by deep learning. Preprint at https://arxiv.org/abs/1811.03481 (2018).
Evans, R. et al. De novo structure prediction with deep-learning based scoring. In Thirteenth Critical Assessment of Techniques for Protein Structure Prediction Abstracts (Iberostar Paraiso, 2018).
Hanson, J., Paliwal, K., Litfin, T., Yang, Y. & Zhou, Y. Accurate prediction of protein contact maps by coupling residual two-dimensional bidirectional long short-term memory with convolutional neural networks. Bioinformatics 34, 4039–4045 (2018).
Google Scholar
Simons, K. T., Kooperberg, C., Huang, E. & Baker, D. Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. J. Mol. Biol. 268, 209–225 (1997).
Article Google Scholar
Xu, D. & Zhang, Y. Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field. Proteins 80, 1715–1735 (2012).
Article Google Scholar
Yang, J. et al. The I-TASSER Suite: protein structure and function prediction. Nat. Methods 12, 7–8 (2015).
Article Google Scholar
Rohl, C. A., Strauss, C. E., Misura, K. M. & Baker, D. Protein structure prediction using Rosetta. Methods Enzymol. 383, 66–93 (2004).
Article Google Scholar
Kim, D. E., Chivian, D. & Baker, D. Protein structure prediction and analysis using the Robetta server. Nucleic Acids Res. 32, W526–W531 (2004).
Article Google Scholar
Gront, D., Kulp, D. W., Vernon, R. M., Strauss, C. E. & Baker, D. Generalized fragment picking in Rosetta: design, protocols and applications. PloS ONE 6, e23294 (2011).
Article Google Scholar
Kalev, I. & Habeck, M. HHfrag: HMM-based fragment detection using HHpred. Bioinformatics 27, 3110–3116 (2011).
Article Google Scholar
Trevizani, R., Custodio, F. L., Dos Santos, K. B. & Dardenne, L. E. Critical features of fragment libraries for protein structure prediction. PloS ONE 12, e0170131 (2017).
Article Google Scholar
Bhattacharya, D., Adhikari, B., Li, J. & Cheng, J. FRAGSION: ultra-fast protein fragment library generation by IOHMM sampling. Bioinformatics 32, 2059–2061 (2016).
Article Google Scholar
de Oliveira, S. H. P. & Deane, C. M. Combining co-evolution and secondary structure prediction to improve fragment library generation. Bioinformatics 34, 2219–2227 (2018).
Article Google Scholar
Wang, S., Sun, S., Li, Z., Zhang, R. & Xu, J. Accurate de novo prediction of protein contact map by ultra-deep learning model. PLoS Comput. Biol. 13, e1005324 (2017).
Article Google Scholar
Wang, S., Li, Z., Yu, Y. & Xu, J. Folding membrane proteins by deep transfer learning. Cell Syst. 5, 202–211 e203 (2017).
Article Google Scholar
Paliwal, K., Hanson, J., Litfin, T., Zhou, Y. & Yang, Y. Improving prediction of protein secondary structure, backbone angles, solvent accessibility and contact numbers by using predicted contact maps and an ensemble of recurrent and residual convolutional neural networks. Bioinformatics 35, 2403–2410 (2018).
Google Scholar
Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735–1780 (1997).
Article Google Scholar
Xie, S., Girshick, R., Dollár, P., Tu, Z. & He, K. Aggregated residual transformations for deep neural networks. In Proc.Conf. Computer Vision and Pattern Recognition 5987–5995 (IEEE, 2017).
Schuster, M. & Paliwal, K. K. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45, 2673–2681 (1997).
Article Google Scholar
Heffernan, R., Yang, Y., Paliwal, K. & Zhou, Y. Capturing non-local interactions by long short-term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility. Bioinformatics 33, 2842–2849 (2017).
Article Google Scholar
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. Preprint at https://arxiv.org/abs/1512.03385 (2015).
Hinton, G., Vinyals, O. & Dean, J. Distilling the knowledge in a neural network. Preprint at https://arxiv.org/abs/1503.02531 (2015).
Yu, F. & Koltun, V. Multi-scale context aggregation by dilated convolutions. Preprint at https://arxiv.org/abs/1511.07122 (2015).
Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).
Article Google Scholar
Zhang, Y. & Skolnick, J. Scoring function for automated assessment of protein structure template quality. Proteins 57, 702–710 (2004).
Article Google Scholar
Wang, G. & Dunbrack, R. L. Jr. PISCES: a protein sequence culling server. Bioinformatics 19, 1589–1591 (2003).
Article Google Scholar
Kabsch, W. & Sander, C. Dictionary of protein secondary structure—pattern-recognition of hydrogen-bonded and geometrical features. Biopolymers 22, 2577–2637 (1983).
Article Google Scholar
Berman, H. M. et al. The protein data bank. Nucleic Acids Res. 28, 235–242 (2000).
Article Google Scholar
Fox, N. K., Brenner, S. E. & Chandonia, J. M. SCOPe: Structural classification of proteins—extended, integrating SCOP and ASTRAL data and classification of new structures. Nucleic Acids Res. 42, D304–D309 (2014).
Article Google Scholar
Jones, D. T. Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 292, 195–202 (1999).
Article Google Scholar
Hubner, I. A., Deeds, E. J. & Shakhnovich, E. I. Understanding ensemble protein folding at atomic detail. Proc. Natl Acad. Sci. USA 103, 17747–17752 (2006).
Article Google Scholar
Carugo, O. & Pongor, S. A normalized root-mean-square distance for comparing protein three-dimensional structures. Protein Sci. 10, 1470–1473 (2001).
Article Google Scholar
Henikoff, S. & Henikoff, J. G. Amino acid substitution matrices from protein blocks. Proc. Natl Acad. Sci. USA 89, 10915–10919 (1992).
Article Google Scholar
Kidera, A., Konishi, Y., Oka, M., Ooi, T. & Scheraga, H. A. Statistical analysis of the physical properties of the 20 naturally occurring amino acids. J. Protein Chem. 4, 23–55 (1985).
Article Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014).
MathSciNet MATH Google Scholar
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980 (2014).
Keskar, N. S. & Socher, R. Improving generalization performance by switching from Adam to SGD. Preprint at https://arxiv.org/abs/1712.07628 (2017).
Ioffe, S. & Szegedy, C. Batch normalization: accelerating deep network training by reducing internal covariate shift. In Proc. 32nd Int. Conf. Machine Learning. Vol. 37 (JMLR, 2015).
Abadi, M. et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems (TensorFlow, 2015); http://download.tensorflow.org/paper/whitepaper2015.pdf
Zhang, Y. & Skolnick, J. TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 33, 2302–2309 (2005).
Article Google Scholar
Tong, W. et al. Improved fragment sampling for ab initio protein structure prediction using deep neural networks (Code Ocean, 2019); https://doi.org/10.24433/CO.3579011.v1

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (grant numbers 31670723, 81861138009, 91746119 and 31621092) and from the Beijing Advanced Innovation Center for Structural Biology to H.G., as well as by the Australian Research Council (grant number DP180102060) and the National Health and Medical Research Council of Australia (grant number 1121629) to Y.Z.

Author information

Authors and Affiliations

MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
Tong Wang, Yanhua Qiao, Wenze Ding, Wenzhi Mao & Haipeng Gong
Beijing Advanced Innovation Center for Structural Biology, Tsinghua University, Beijing, China
Tong Wang, Yanhua Qiao, Wenze Ding, Wenzhi Mao & Haipeng Gong
Institute for Glycomics and School of Information and Communication Technology, Griffith University, Gold Coast, Queensland, Australia
Yaoqi Zhou

Authors

Tong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yanhua Qiao
View author publications
You can also search for this author in PubMed Google Scholar
Wenze Ding
View author publications
You can also search for this author in PubMed Google Scholar
Wenzhi Mao
View author publications
You can also search for this author in PubMed Google Scholar
Yaoqi Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Haipeng Gong
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

T.W. contributed to methodology, experimental design, software, formal analysis, the server and writing of the original draft. Y.Q. contributed to formal analysis and the server. W.D. contributed to the server. W.M. was involved in methodology. Y.Z. was involved in experimental design and writing. H.G. contributed to experimental design and was responsible for supervision, writing and funding acquisition. All authors reviewed the final manuscript.

Corresponding authors

Correspondence to Yaoqi Zhou or Haipeng Gong.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Methods (details of our in-house residue-residue contact refinement models), Supplementary Figs. S1–S22, Supplementary Tables S1–S11, and Supplementary references

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, T., Qiao, Y., Ding, W. et al. Improved fragment sampling for ab initio protein structure prediction using deep neural networks. Nat Mach Intell 1, 347–355 (2019). https://doi.org/10.1038/s42256-019-0075-7

Download citation

Received: 08 March 2019
Accepted: 02 July 2019
Published: 09 August 2019
Issue Date: August 2019
DOI: https://doi.org/10.1038/s42256-019-0075-7

This article is cited by

Artificial intelligence for template-free protein structure prediction: a comprehensive review
- M. M. Mohamed Mufassirin
- M. A. Hakim Newton
- Abdul Sattar
Artificial Intelligence Review (2023)
Evaluation guidelines for machine learning tools in the chemical sciences
- Andreas Bender
- Nadine Schneider
- Tiago Rodrigues
Nature Reviews Chemistry (2022)
Artificial intelligence in cancer target identification and drug discovery
- Yujie You
- Xin Lai
- Le Zhang
Signal Transduction and Targeted Therapy (2022)
Deep Metallogenic prediction model construction of the Xiongcun no. II orebody based on the DNN algorithm
- Di Zhang
- Zhongli Zhou
- Jie Luo
Multimedia Tools and Applications (2022)
Complementing sequence-derived features with structural information extracted from fragment libraries for protein structure prediction
- Siyuan Liu
- Tong Wang
- Tie-Yan Liu
BMC Bioinformatics (2021)