Understanding the physical properties that control protein crystallization by analysis of large-scale experimental data


Crystallization is the most serious bottleneck in high-throughput protein-structure determination by diffraction methods. We have used data mining of the large-scale experimental results of the Northeast Structural Genomics Consortium and experimental folding studies to characterize the biophysical properties that control protein crystallization. This analysis leads to the conclusion that crystallization propensity depends primarily on the prevalence of well-ordered surface epitopes capable of mediating interprotein interactions and is not strongly influenced by overall thermodynamic stability. We identify specific sequence features that correlate with crystallization propensity and that can be used to estimate the crystallization probability of a given construct. Analyses of entire predicted proteomes demonstrate substantial differences in the amino acid–sequence properties of human versus eubacterial proteins, which likely reflect differences in biophysical properties, including crystallization propensity. Our thermodynamic measurements do not generally support previous claims regarding correlations between sequence properties and protein stability.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: Protein stability does not strongly influence success in crystal-structure solution.
Figure 2: Hydrodynamic properties strongly influence success in crystal structure solution.
Figure 3: Correlations between sequence characteristics and success in crystal structure solution.
Figure 4: Four major predictors of success in crystal structure solution.
Figure 5: Performance of the PXS metric predicting probability of successful crystal-structure determination.


  1. 1

    Abrahams, J.P., Leslie, A.G., Lutter, R. & Walker, J.E. Structure at 2.8 A resolution of F1-ATPase from bovine heart mitochondria. Nature 370, 621–628 (1994).

    CAS  Article  Google Scholar 

  2. 2

    Cramer, P., Bushnell, D.A. & Kornberg, R.D. Structural basis of transcription: RNA polymerase II at 2.8 angstrom resolution. Science 292, 1863–1876 (2001).

    CAS  Article  Google Scholar 

  3. 3

    Deisenhofer, J. & Michel, H. The photosynthetic reaction center from the purple bacterium Rhodopseudomonas viridis. Science 245, 1463–1473 (1989).

    CAS  Article  Google Scholar 

  4. 4

    Gnatt, A.L., Cramer, P., Fu, J., Bushnell, D.A. & Kornberg, R.D. Structural basis of transcription: an RNA polymerase II elongation complex at 3.3 A resolution. Science 292, 1876–1882 (2001).

    CAS  Article  Google Scholar 

  5. 5

    Kendrew, J.C. Structure and function in myoglobin and other proteins. Fed. Proc. 18, 740–751 (1959).

    CAS  PubMed  Google Scholar 

  6. 6

    Acton, T.B. et al. Robotic cloning and Protein Production Platform of the Northeast Structural Genomics Consortium. Methods Enzymol. 394, 210–243 (2005).

    CAS  Article  Google Scholar 

  7. 7

    Chen, L., Oughtred, R., Berman, H.M. & Westbrook, J. TargetDB: a target registration database for structural genomics projects. Bioinformatics 20, 2860–2862 (2004).

    CAS  Article  Google Scholar 

  8. 8

    Blundell, T.L., Jhoti, H. & Abell, C. High-throughput crystallography for lead discovery in drug design. Nat. Rev. Drug Discov. 1, 45–54 (2002).

    CAS  Article  Google Scholar 

  9. 9

    Kitchen, D.B., Decornez, H., Furr, J.R. & Bajorath, J. Docking and scoring in virtual screening for drug discovery: methods and applications. Nat. Rev. Drug Discov. 3, 935–949 (2004).

    CAS  Article  Google Scholar 

  10. 10

    Dasgupta, S., Iyer, G.H., Bryant, S.H., Lawrence, C.E. & Bell, J.A. Extent and nature of contacts between protein molecules in crystal lattices and between subunits of protein oligomers. Proteins 28, 494–514 (1997).

    CAS  Article  Google Scholar 

  11. 11

    Janin, J. & Rodier, F. Protein-protein interaction at crystal contacts. Proteins 23, 580–587 (1995).

    CAS  Article  Google Scholar 

  12. 12

    Cooper, D.R. et al. Protein crystallization by surface entropy reduction: optimization of the SER strategy. Acta Crystallogr. D Biol. Crystallogr. 63, 636–645 (2007).

    CAS  Article  Google Scholar 

  13. 13

    Derewenda, Z.S. The use of recombinant methods and molecular engineering in protein crystallization. Methods 34, 354–363 (2004).

    CAS  Article  Google Scholar 

  14. 14

    Derewenda, Z.S. Rational protein crystallization by mutational surface engineering. Structure 12, 529–535 (2004).

    CAS  Article  Google Scholar 

  15. 15

    Derewenda, Z.S. & Vekilov, P.G. Entropy and surface engineering in protein crystallization. Acta Crystallogr. D Biol. Crystallogr. 62, 116–124 (2006).

    Article  Google Scholar 

  16. 16

    Longenecker, K.L., Garrard, S.M., Sheffield, P.J. & Derewenda, Z.S. Protein crystallization by rational mutagenesis of surface residues: Lys to Ala mutations promote crystallization of RhoGDI. Acta Crystallogr. D Biol. Crystallogr. 57, 679–688 (2001).

    CAS  Article  Google Scholar 

  17. 17

    Mateja, A. et al. The impact of Glu → Ala and Glu → Asp mutations on the crystallization properties of RhoGDI: the structure of RhoGDI at 1.3 A resolution. Acta Crystallogr. D Biol. Crystallogr. 58, 1983–1991 (2002).

    Article  Google Scholar 

  18. 18

    Canaves, J.M., Page, R., Wilson, I.A. & Stevens, R.C. Protein biophysical properties that correlate with crystallization success in Thermotoga maritima: maximum clustering strategy for structural genomics. J. Mol. Biol. 344, 977–991 (2004).

    CAS  Article  Google Scholar 

  19. 19

    Oldfield, C.J., Ulrich, E.L., Cheng, Y., Dunker, A.K. & Markley, J.L. Addressing the intrinsic disorder bottleneck in structural proteomics. Proteins 59, 444–453 (2005).

    CAS  Article  Google Scholar 

  20. 20

    Goh, C.S. et al. Mining the structural genomics pipeline: identification of protein properties that affect high-throughput experimental analysis. J. Mol. Biol. 336, 115–130 (2004).

    CAS  Article  Google Scholar 

  21. 21

    Overton, I.M. & Barton, G.J. A normalised scale for structural genomics target ranking: the OB-Score. FEBS Lett. 580, 4005–4009 (2006).

    CAS  Article  Google Scholar 

  22. 22

    Slabinski, L. et al. The challenge of protein structure determination–lessons from structural genomics. Protein Sci. 16, 2472–2482 (2007).

    CAS  Article  Google Scholar 

  23. 23

    Niesen, F.H., Berglund, H. & Vedadi, M. The use of differential scanning fluorimetry to detect ligand interactions that promote protein stability. Nat. Protocols 2, 2212–2221 (2007).

    CAS  Article  Google Scholar 

  24. 24

    Uversky, V.N. Natively unfolded proteins: a point where biology waits for physics. Protein Sci. 11, 739–756 (2002).

    CAS  Article  Google Scholar 

  25. 25

    Wukovitz, S.W. & Yeates, T.O. Why protein crystals favour some space-groups over others. Nat. Struct. Biol. 2, 1062–1067 (1995).

    CAS  Article  Google Scholar 

  26. 26

    Banatao, D.R. et al. An approach to crystallizing proteins by synthetic symmetrization. Proc. Natl. Acad. Sci. USA 103, 16230–16235 (2006).

    CAS  Article  Google Scholar 

  27. 27

    Cumbaa, C.A. et al. Automatic classification of sub-microlitre protein-crystallization trials in 1536-well plates. Acta Crystallogr. 59, 1619–1627 (2003).

    Google Scholar 

  28. 28

    Kyte, J. & Doolittle, R.F. A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157, 105–132 (1982).

    CAS  Article  Google Scholar 

  29. 29

    Creamer, T.P. Side-chain conformational entropy in protein unfolded states. Proteins 40, 443–450 (2000).

    CAS  Article  Google Scholar 

  30. 30

    Ward, J.J., McGuffin, L.J., Bryson, K., Buxton, B.F. & Jones, D.T. The Drosoph. Inf. Serv.OPRED server for the prediction of protein disorder. Bioinformatics 20, 2138–2139 (2004).

    CAS  Article  Google Scholar 

  31. 31

    Vekilov, P.G. Solvent entropy effects in the formation of protein solid phases. Methods Enzymol. 368, 84–105 (2003).

    CAS  Article  Google Scholar 

  32. 32

    Rost, B., Yachdav, G. & Liu, J. The PredictProtein server. Nucleic Acids Res. 32, W321–326 (2004).

    CAS  Article  Google Scholar 

  33. 33

    Goh, C.S. et al. SPINE 2: a system for collaborative structural proteomics within a federated database framework. Nucleic Acids Res. 31, 2833–2838 (2003).

    CAS  Article  Google Scholar 

  34. 34

    Bertone, P. et al. SPINE: an integrated tracking database and data mining approach for identifying feasible targets in high-throughput structural proteomics. Nucleic Acids Res. 29, 2884–2898 (2001).

    CAS  Article  Google Scholar 

  35. 35

    Rice, P., Longden, I. & Bleasby, A. EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 16, 276–277 (2000).

    CAS  Article  Google Scholar 

  36. 36

    Appel, R.D., Bairoch, A. & Hochstrasser, D.F. A new generation of information retrieval tools for biologists: the example of the ExPASy WWW server. Trends Biochem. Sci. 19, 258–260 (1994).

    CAS  Article  Google Scholar 

  37. 37

    Rost, B. in The Proteomics Protocols Handbook (ed. J.E. Walker) 875–901 (Humana Press, Totowa, New Jersey, 2005).

    Google Scholar 

Download references


This work was supported by Protein Structure Initiative grants from the National Institutes of Health (NIH) to the Northeast Structural Genomics Consortium and the Center for High-Throughput Structural Biology. The full staffs of these consortia contributed to the experimental data analyzed in this paper. W.N.P. II was supported in part by an NIH training grant to the Department of Biological Sciences at Columbia, and S.K.H. was supported in part by a National Science Foundation grant to J.F.H. The authors thank Wayne Hendrickson and Liang Tong for support and advice and John Schwanoff and the New York Structural Biology Center for maintenance of the X4 beamlines at Brookhaven National Laboratory.

Author information



Corresponding author

Correspondence to John F Hunt.

Supplementary information

Supplementary Text and Figures

Figures 1–20, Tables 1–4, Methods, Notes (PDF 2442 kb)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Price II, W., Chen, Y., Handelman, S. et al. Understanding the physical properties that control protein crystallization by analysis of large-scale experimental data. Nat Biotechnol 27, 51–57 (2009). https://doi.org/10.1038/nbt.1514

Download citation

Further reading


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing