The Phyre2 web portal for protein modeling, prediction and analysis

Journal name:
Nature Protocols
Volume:
10,
Pages:
845–858
Year published:
DOI:
doi:10.1038/nprot.2015.053
Published online

Abstract

Phyre2 is a suite of tools available on the web to predict and analyze protein structure, function and mutations. The focus of Phyre2 is to provide biologists with a simple and intuitive interface to state-of-the-art protein bioinformatics tools. Phyre2 replaces Phyre, the original version of the server for which we previously published a paper in Nature Protocols. In this updated protocol, we describe Phyre2, which uses advanced remote homology detection methods to build 3D models, predict ligand binding sites and analyze the effect of amino acid variants (e.g., nonsynonymous SNPs (nsSNPs)) for a user's protein sequence. Users are guided through results by a simple interface at a level of detail they determine. This protocol will guide users from submitting a protein sequence to interpreting the secondary and tertiary structure of their models, their domain composition and model quality. A range of additional available tools is described to find a protein structure in a genome, to submit large number of sequences at once and to automatically run weekly searches for proteins that are difficult to model. The server is available at http://www.sbg.bio.ic.ac.uk/phyre2. A typical structure prediction will be returned between 30 min and 2 h after submission.

At a glance

Figures

  1. Normal mode Phyre2 pipeline showing algorithmic stages.
    Figure 1: Normal mode Phyre2 pipeline showing algorithmic stages.

    Stage numbers are shown in circles, and elements within a stage are surrounded by a dashed box. Stage 1 (gathering homologous sequences): a query sequence is scanned against the specially curated nr20 (no sequences with >20% mutual sequence identity) protein sequence database with HHblits. The resulting multiple-sequence alignment is used to predict secondary structure with PSIPRED and both the alignment and secondary structure prediction combined into a query hidden Markov model. Stage 2 (fold library scanning): this is scanned against a database of HMMs of proteins of known structure. The top-scoring alignments from this search are used to construct crude backbone-only models. Stage 3 (loop modeling): indels in these models are corrected by loop modeling. Stage 4 (side-chain placement): amino acid side chains are added to generate the final Phyre2 model.

  2. Intensive mode Phyre2 pipeline.
    Figure 2: Intensive mode Phyre2 pipeline.

    Once a set of models has been generated, as shown in stages 1–3 of Figure 1, models are chosen by heuristics to maximize both confidence and coverage of the query sequence. Pairwise Cα-Cα distances are extracted from these models and treated as linear inelastic springs in Poing. Regions not covered by templates are handled by the ab initio components of the Poing algorithm: preferentially, bombardment of hydrophobic residues by notional solvent molecules to encourage burial, predicted secondary structure springs to maintain α-helix or β-strand conformations, and prevention of steric clash. The new protein is 'synthesized' from a virtual ribosome in the context of these forces, and the final Cα structure is used to construct a full backbone using Pulchra followed by side chain addition using R3.

  3. Phyre Investigator user interface.
    Figure 3: Phyre Investigator user interface.

    (a) Information box. (b) Structure and analyses view. (c) Sequence view. The structure and analyses view shows an interactive 3D JSmol viewer, buttons to toggle different analyses and two bar graphs (in this case for residue A34) showing the sequence profile preferences and predicted likelihood of a phenotypic effect from each of the 20 possible mutations at this position.

  4. Example Phyre2 summary results page.
    Figure 4: Example Phyre2 summary results page.

    On the left is an image of a large all-β structure. Clicking on the image will download a PDB-formatted file containing this structure. On the right are various data regarding the model, including the following: PDB code of the template used, information about the protein template extracted from the PDB file, confidence in the model and coverage of the query sequence (100% and 28%, respectively). In this case, there is additional text informing the user that although only 28% of the query could be modeled by a single template, other high-confidence templates were also detected that could increase this coverage to 55% by using Phyre's intensive mode. Finally, there is a link to launch the JSmol 3D viewer in the browser and a link to a FAQ describing popular external molecular viewing software.

  5. Examples of the three main sections of a typical Phyre2 results page.
    Figure 5: Examples of the three main sections of a typical Phyre2 results page.

    (a) Example secondary structure and disorder prediction. The query sequence is colored as described in Step 17. Question marks indicate predicted disordered regions. Each type of prediction is associated with a rainbow color-coded confidence (red, highest confidence; blue, lowest confidence). (b) Example of the domain analysis results section described in Steps 20–22. The width of the box indicates the length of the query sequence. In this example, confident (red) matches have been found at the N terminus (rank 6) and the C terminus (ranks 1–5), but no confident matches have been found to the intervening segment. (c) Example of the detailed table of results described in Steps 23 and 24 and 29–32. In this example, the rank 1 and rank 2 matches have confidence of 100% and sequence identities of 23% and 24%, respectively.

  6. Example alignment between user query sequence and known structure, as described in Steps 25-28.
    Figure 6: Example alignment between user query sequence and known structure, as described in Steps 25–28.

    Sequence coloring is as described in Step 17. Identical residues between query and template have a gray background. Secondary structures (predicted and known) are displayed: in this case α-helices. Color-coded per-residue confidence in both the alignment (from HHsearch) and in secondary structure prediction is displayed. The level of residue conservation for both the query and template sequences is also shown, where thicker horizontal bars indicate greater degrees of conservation.

References

  1. Mukherjee, S., Szilagyi, A., Roy, A. & Zhang, Y. Genome-wide protein structure prediction. in Multiscale Approaches to Protein Modeling (ed. Kolinski, A.) Ch. 11, 255279 (Springer, 2010).
  2. Koonin, E.V., Wolf, Y.I. & Karev, G.P. The structure of the protein universe and genome evolution. Nature 420, 218223 (2002).
  3. Kelley, L.A. & Sternberg, M.J.E. Protein structure prediction on the web: a case study using the Phyre server. Nat. Protoc. 4, 363371 (2009).
  4. Mao, C. et al. Functional assignment of Mycobacterium tuberculosis proteome by genome-scale fold-recognition. Tuberculosis 1, 93 (2013).
  5. Lewis, T.E. et al. Genome3D: a UK collaborative project to annotate genomic sequences with predicted 3D structures based on SCOP and CATH domains. Nucl. Acids Res. 41, D499D507 (2013).
  6. Fucile, G. et al. ePlant and the 3D data display initiative: integrative systems biology on the world wide web. PLoS ONE 6, e15237 (2010).
  7. Moult, J., Fidelis, K., Kryshtafovych, A., Schwede, T. & Tramontano, A. Critical assessment of methods of protein structure prediction (CASP)—round X. Proteins 82 S2: 16 (2014).
  8. Roy, A., Kucukural, A. & Zhang, Y. I-TASSER: a unified platform for automated protein structure and function prediction. Nat. Protoc. 5, 725738 (2010).
  9. Arnold, K., Bordoli, L., Kopp, J. & Schwede, T. The SWISS-MODEL Workspace: a web-based environment for protein structure homology modelling. Bioinformatics 22, 195201 (2006).
  10. Söding, J. Protein homology detection by HMM-HMM comparison. Bioinformatics 21, 951960 (2005).
  11. Lobley, A., Sadowski, M.I. & Jones, D.T. pGenTHREADER and pDomTHREADER: new methods for improved protein fold recognition and superfamily discrimination. Bioinformatics. 25, 17611767 (2009).
  12. Raman, S. Structure prediction for CASP8 with all-atom refinement using Rosetta. Proteins 77 (suppl. 9), 8999 (2009).
  13. Källberg, M. et al. Template-based protein structure modeling using the RaptorX web server. Nat. Protoc. 7, 15111522 (2012).
  14. Altschul, S.F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 33893402 (1997).
  15. Remmert, M., Biegert, A., Hauser, A. & Söding, J. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat. Methods 9, 173175 (2012).
  16. Jones, D.T. Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol. 292, 195202 (1999).
  17. Canutescu, A.A. & Dunbrack, R.L. Cyclic coordinate descent: a robotics algorithm for protein loop closure. Protein Sci. 12, 963972 (2003).
  18. Jefferys, B.R., Kelley, L.A. & Sternberg, M.J. Protein folding requires crowd control in a simulated cell. J. Mol. Biol. 397, 13291338 (2010).
  19. Rotkiewicz, P. & Skolnick, J. Fast procedure for reconstruction of full-atom protein models from reduced representations. J. Comput. Chem. 29, 14601465 (2008).
  20. Wei, X. & Sahinidis, N.V. Residue-rotamer-reduction algorithm for the protein side-chain conformation problem. Bioinformatics 22, 188194 (2006).
  21. Arjun, R., Lindahl, E. & Wallner, B. Improved model quality assessment using ProQ2. BMC Bioinformatics 13, 224 (2012).
  22. Davis, I.W. et al. MolProbity: all-atom contacts and structure validation for proteins and nucleic acids. Nucleic acids Res 35 (suppl. 2), W375W383 (2007).
  23. Schmidtke, P., Le Guilloux, V., Maupetit, J. & Tufféry, P. Fpocket: online tools for protein ensemble pocket detection and tracking. Nucleic acids Res 38 (suppl. 2), W582W589 (2010).
  24. Porter, C.T., Bartlett, G.J. & Thornton, J.M. The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data. Nucleic acids Res 32 (suppl. 1), D129D133 (2004).
  25. Yates, C.M., Filippis, I., Kelley, L.A. & Sternberg, M.J. SuSPect: enhanced prediction of single amino acid variant (SAV) phenotype using network features. J. Mol. Biol. 426, 26922701 (2014).
  26. Capra, J.A. & Singh, M. Predicting functionally important residues from sequence conservation. Bioinformatics 23, 18751882 (2007).
  27. Higurashi, M., Ishida, T. & Kinoshita, K. PiSite: a database of protein interaction sites using multiple binding states in the PDB. Nucleic Acids Res. 37 (Database issue): D360D364 (2009).
  28. Marchler-Bauer, A. et al. CDD: conserved domains and protein three-dimensional structure. Nucleic Acids Res 41 (D1): D348D352 (2013).
  29. Adzhubei, I.A. et al. A method and server for predicting damaging missense mutations. Nat. Methods 7, 248249 (2010).
  30. Sim, N. et al. SIFT web server: predicting effects of amino acid substitutions on proteins. Nucleic acids Res. 40 W1: W452W457 (2012).
  31. González-Pérez, A. & López-Bigas, N. Improving the assessment of the outcome of nonsynonymous SNVs with a consensus deleteriousness score, Condel. Am. J. Hum. Genet. 88, 440449 (2011).
  32. Ward, J.J., Sodhi, J.S., McGuffin, L.J., Buxton, B.F. & Jones, D.T. Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. J. Mol. Biol. 337, 635645 (2004).
  33. Siew, N., Elofsson, A., Rychlewski, L. & Fischer, D. MaxSub: an automated measure for the assessment of protein structure prediction quality. Bioinformatics. 16, 776785 (2000).
  34. Wass, M.N., Kelley, L.A. & Sternberg, M.J. 3DLigandSite: predicting ligand-binding sites using similar structures. Nucleic Acids Res. 38, W469W473 (2010).
  35. Jones, D.T. Improving the accuracy of transmembrane protein topology prediction using evolutionary information. Bioinformatics 3, 538544 (2007).

Download references

Author information

  1. Present address: University College London (UCL) Cancer Institute, London, UK.

    • Christopher M Yates
  2. Present address: Centre for Molecular Processing, School of Biosciences, University of Kent, Kent, UK.

    • Mark N Wass

Affiliations

  1. Structural Bioinformatics Group, Imperial College London, London, UK.

    • Lawrence A Kelley,
    • Stefans Mezulis,
    • Christopher M Yates,
    • Mark N Wass &
    • Michael J E Sternberg

Contributions

L.A.K. designed the Phyre2 system and wrote the paper; M.J.E.S. supervised the project; S.M. developed the multiple template modeling protocol; C.M.Y. developed the SuSPect method and M.N.W. developed the 3DLigandSite web resource.

Competing financial interests

M.J.E.S. is a director and shareholder in Equinox Pharma Ltd., which uses bioinformatics and chemoinformatics in drug discovery research and services.

Corresponding author

Correspondence to:

Author details

Additional data