Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Template-based protein structure modeling using the RaptorX web server

Abstract

A key challenge of modern biology is to uncover the functional role of the protein entities that compose cellular proteomes. To this end, the availability of reliable three-dimensional atomic models of proteins is often crucial. This protocol presents a community-wide web-based method using RaptorX (http://raptorx.uchicago.edu/) for protein secondary structure prediction, template-based tertiary structure modeling, alignment quality assessment and sophisticated probabilistic alignment sampling. RaptorX distinguishes itself from other servers by the quality of the alignment between a target sequence and one or multiple distantly related template proteins (especially those with sparse sequence profiles) and by a novel nonlinear scoring function and a probabilistic-consistency algorithm. Consequently, RaptorX delivers high-quality structural models for many targets with only remote templates. At present, it takes RaptorX 35 min to finish processing a sequence of 200 amino acids. Since its official release in August 2011, RaptorX has processed 6,000 sequences submitted by 1,600 users from around the world.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Figure 1: Performance assessment of core prediction modules in the RaptorX server.
Figure 2: Workflow used by the RaptorX server.
Figure 3: Job-listing interface.
Figure 4: Secondary structure result interface.
Figure 5: Tertiary structure result interface.
Figure 6: Disorder prediction result display.
Figure 7: Custom alignment result interface.
Figure 8: Domain parsing result display.

References

  1. Aebersold, R. & Mann, M. Mass spectrometry-based proteomics. Nature 422, 198–207 (2003).

    Article  PubMed  CAS  Google Scholar 

  2. Källberg, M. & Lu, H. An improved machine learning protocol for the identification of correct Sequest search results. BMC Bioinformatics 11, 591 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  3. Bairoch, A. The ENZYME database in 2000. Nucleic Acids Res 28, 304–305 (2000).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  4. Hannum, G. et al. Genome-wide association data reveal a global map of genetic interactions among protein complexes. PLoS Genet 5, e1000782 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  5. Berman, H.M. et al. The Protein Data Bank. Nucleic Acids Res 28, 235–242 (2000).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  6. Martí-Renom, M.A. et al. Comparative protein structure modeling of genes and genomes. Annu. Rev. Biophys. Biomol. Struct. 29, 291–325 (2000).

    Article  PubMed  Google Scholar 

  7. Soding, J. Protein homology detection by HMM-HMM comparison. Bioinformatics 21, 951–960 (2005).

    Article  PubMed  Google Scholar 

  8. Bowie, J.U., Lüthy, R. & Eisenberg, D. A method to identify protein sequences that fold into a known three-dimensional structure. Science 253, 164–170 (1991).

    Article  PubMed  CAS  Google Scholar 

  9. Jones, D.T., Taylor, W.R. & Thornton, J.M. A new approach to protein fold recognition. Nature 358, 86–89 (1992).

    Article  PubMed  CAS  Google Scholar 

  10. Wu, S. & Zhang, Y. MUSTER: Improving protein sequence profile-profile alignments by using multiple sources of structure information. Proteins 72, 547–556 (2008).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  11. Zhang, C., Liu, S., Zhou, H. & Zhou, Y. An accurate, residue-level, pair potential of mean force for folding and binding based on the distance-scaled, ideal-gas reference state. Protein Sci. 13, 400–411 (2004).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  12. Zhang, W., Liu, S. & Zhou, Y. SP5: improving protein fold recognition by using torsion angle profiles and profile-based gap penalty model. PLoS ONE 3, e2325 (2008).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  13. Xu, J. & Li, M. Assessment of RAPTOR's linear programming approach in CAFASP3. Proteins 53, 579–584 (2003).

    Article  PubMed  CAS  Google Scholar 

  14. Xu, J., Li, M., Kim, D. & Xu, Y. RAPTOR: optimal protein threading by linear programming. J. Bioinform. Comput. Biol. 1, 95–117 (2003).

    Article  PubMed  CAS  Google Scholar 

  15. Xu, J., Li, M., Lin, G., Kim, D. & Xu, Y. Protein threading by linear programming. Pac. Symp. Biocomput. 264–275 (2003).

  16. Baker, D. & Sali, A. Protein structure prediction and structural genomics. Science 294, 93–96 (2001).

    Article  PubMed  CAS  Google Scholar 

  17. Liwo, A., Lee, J., Ripoll, D.R., Pillardy, J. & Scheraga, H.A. Protein structure prediction by global optimization of a potential energy function. Proc. Natl. Acad. Sci. USA 96, 5482–5485 (1999).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  18. Simons, K.T., Kooperberg, C., Huang, E. & Baker, D. Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. J. Mol. Biol. 268, 209–225 (1997).

    Article  PubMed  CAS  Google Scholar 

  19. Wu, S., Skolnick, J. & Zhang, Y. Ab initio modeling of small proteins by iterative TASSER simulations. BMC Biol. 5, 17 (2007).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  20. Zhang, Y. I-TASSER: fully automated protein structure prediction in CASP8. Proteins 77, 100–113 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  21. Pieper, U. et al. MODBASE, a database of annotated comparative protein structure models and associated resources. Nucleic Acids Res 37, D347–D354 (2009).

    Article  PubMed  CAS  Google Scholar 

  22. Peng, J. & Xu, J. RaptorX: exploiting structure information for protein alignment by statistical inference. Proteins 79, 161–171 (2011).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  23. Peng, J. & Xu, J. Low-homology protein threading. Bioinformatics 26, i294–i300 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  24. Peng, J. & Xu, J. Boosting Protein Threading Accuracy. Lect. Notes Comput. Sci. 5541, 31–45 (2009).

    Article  CAS  Google Scholar 

  25. Peng, J. & Xu, J. A multiple-template approach to protein threading. Proteins 79, 1930–1939 (2011).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  26. Roy, A., Kucukural, A. & Zhang, Y. I-TASSER: a unified platform for automated protein structure and function prediction. Nat. Protoc. 5, 725–738 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  27. Mariani, V., Kiefer, F., Schmidt, T., Haas, J. & Schwede, T. Assessment of template based protein structure predictions in CASP9. Proteins 79, 37–58 (2011).

    Article  PubMed  CAS  Google Scholar 

  28. Peng, J., Bo, L. & Xu, J. Conditional neural fields. In Advances in Neural Information Processing Systems 22 (eds. Bengio Y., Schuurmans D., Lafferty J., Williams C.K.I. and Culotta A.) 1419–1427 (Neural Information Processing Systems Foundation, 2009).

  29. Eickholt, J., Deng, X. & Cheng, J. DoBo: Protein domain boundary prediction by integrating evolutionary signals and machine learning. BMC Bioinformatics 12, 43 (2011).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  30. Buchan, D.W. et al. Protein annotation and modelling servers at University College London. Nucleic Acids Res 38, W563–W568 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  31. Pollastri, G., Przybylski, D., Rost, B. & Baldi, P. Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles. Proteins 47, 228–235 (2002).

    Article  PubMed  CAS  Google Scholar 

  32. Punta, M. et al. The Pfam protein families database. Nucleic Acids Res. 40, D290–D301 (2012).

    Article  PubMed  CAS  Google Scholar 

  33. Fiser, A. & Sali, A. Modeller: generation and refinement of homology-based protein structure models. Methods Enzymol. 374, 461–491 (2003).

    Article  PubMed  CAS  Google Scholar 

  34. Zhao, H., Yang, Y. & Zhou, Y. Highly accurate and high-resolution function prediction of RNA binding proteins by fold recognition and binding affinity prediction. RNA Biol. 8, 988–996 (2011).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  35. Kulkarni-Kale, U., Bhosle, S. & Kolaskar, A.S. CEP: a conformational epitope prediction server. Nucleic Acids Res. 33, W168–W171 (2005).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  36. Morris, G.M. et al. AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility. J. Comput. Chem. 30, 2785–2791 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  37. Lorber, D.M. & Shoichet, B.K. Hierarchical docking of databases of multiple ligand conformations. Curr. Top Med. Chem. 5, 739–749 (2005).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  38. Singh, R., Park, D., Xu, J., Hosur, R. & Berger, B. Struct2Net: a web service to predict protein-protein interactions using a structure-based approach. Nucleic Acids Res. 38, W508–W515 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  39. Singh, R., Xu, J. & Berger, B. Struct2net: integrating structure into protein-protein interaction prediction. Pac. Symp. Biocomput. 403–414 (2006).

  40. Carson, M.B., Langlois, R. & Lu, H. NAPS: a residue-level nucleic acid-binding prediction server. Nucleic Acids Res. 38, W431–W435 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  41. Wallace, I.M., O'Sullivan, O., Higgins, D.G. & Notredame, C. M-Coffee: combining multiple sequence alignment methods with T-Coffee. Nucleic Acids Res. 34, 1692–1699 (2006).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  42. Notredame, C., Higgins, D.G. & Heringa, J. T-Coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302, 205–217 (2000).

    Article  PubMed  CAS  Google Scholar 

  43. Zhang, Y. & Skolnick, J. Scoring function for automated assessment of protein structure template quality. Proteins 57, 702–710 (2004).

    Article  PubMed  CAS  Google Scholar 

  44. Charniak, E. Statistical Language Learning (MIT Press, 1993).

  45. Murzin, A.G., Brenner, S.E., Hubbard, T. & Chothia, C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536–540 (1995).

    PubMed  CAS  Google Scholar 

  46. Andreeva, A. et al. Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res. 36, D419–D425 (2008).

    Article  PubMed  CAS  Google Scholar 

  47. Wang, Z., Zhao, F., Peng, J. & Xu, J. Protein 8-class secondary structure prediction using conditional neural fields. Proteomics 11, 3786–3792 (2011).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  48. Finn, R.D. et al. The Pfam protein families database. Nucleic Acids Res. 38, D211–D222 (2010).

    Article  PubMed  CAS  Google Scholar 

  49. Ward, J.J., McGuffin, L.J., Bryson, K., Buxton, B.F. & Jones, D.T. The DISOPRED server for the prediction of protein disorder. Bioinformatics 20, 2138–2139 (2004).

    Article  PubMed  CAS  Google Scholar 

  50. Kelley, L.A. & Sternberg, M.J.E. Protein structure prediction on the Web: a case study using the Phyre server. Nat. Protoc. 4, 363–371 (2009).

    Article  PubMed  CAS  Google Scholar 

  51. Soding, J., Biegert, A. & Lupas, A.N. The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res. 33, W244–W248 (2005).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  52. Kim, D.E., Chivian, D. & Baker, D. Protein structure prediction and analysis using the Robetta server. Nucleic Acids Res. 32, W526–W531 (2004).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

Download references

Acknowledgements

This work is supported by the US National Institutes of Health grants R01GM0897532, a US National Science Foundation grant DBI-0960390, a Microsoft PhD Research Fellowship, an FMC Educational Fund Fellowship and the Toyota Technical Institute at Chicago summer intern program. We are grateful to the University of Chicago Beagle team, TeraGrid and Canada's Shared Hierarchical Academic Research Computing Network (SHARCNet) for their support of computational resources.

Author information

Authors and Affiliations

Authors

Contributions

J.X. conceived and supervised the project. M.K. and H.W. designed and developed the web server. H.L. oversaw server development. J.P. developed the threading algorithm. S.W. designed the template database. Z.W. developed the protein secondary structure prediction algorithm. M.K. and J.X. wrote the paper.

Corresponding author

Correspondence to Jinbo Xu.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Källberg, M., Wang, H., Wang, S. et al. Template-based protein structure modeling using the RaptorX web server. Nat Protoc 7, 1511–1522 (2012). https://doi.org/10.1038/nprot.2012.085

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nprot.2012.085

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing