Main

Protein structure determination by NMR spectroscopy is typically a lengthy process. The slowest step in the overall process is the collection and analysis of nuclear Overhauser enhancement (NOE) spectra to obtain through-space correlation information used to generate a three-dimensional structure.

Many groups have thus been developing methods to minimize the time required for NMR-based structure determination and to simplify the process. The groups of Ad Bax of the US National Institute of Diabetes and Digestive and Kidney Diseases at the National Institutes of Health, David Baker of the University of Washington and their colleagues from the Northeast Structural Genomics Consortium (NESG) now present a method that uses chemical shift information alone to generate high-quality protein structures (Shen et al., 2008).

Each nucleus in a molecule experiences a unique chemical environment and thus has a distinct chemical shift in an NMR spectrum. Yang Shen in the Bax group previously developed a program called Sparta, which predicts chemical shifts for proteins of known structure, and used this program to generate a database of hypothetical chemical shifts for more than 5,000 proteins. For a new protein structure, experimental chemical shifts are matched to the database to identify peptide fragments with similar hypothetical chemical shifts. Working with Baker's postdoc Oliver Lange, Shen then used these fragments as inputs for de novo protein structure generation with Baker's powerful modeling software, Rosetta, “basically out of the box with very little adaptation,” explains Bax. They call the method CS-Rosetta, for chemical shift-Rosetta. Michele Vendruscolo's group from Cambridge University also recently published a very similar concept called Cheshire (Cavalli et al., 2007).

The researchers tested CS-Rosetta on 16 small proteins for which NMR or crystal structures were already available. For all of these test cases, the Rosetta models came within 0.7–1.8 Å of the experimentally derived structures. However, “when you're testing protein structure prediction programs, if you already know the answer you never quite know whether you can 100% trust it,” notes Bax. “You sort of start saying, 'well this is obvious, this is logical, and clearly I shouldn't take this answer because it's ridiculous'. And in the absence of a blind test, you can never be really sure.” Therefore they also tested CS-Rosetta on 9 novel protein targets of the NESG, which were in the process of being solved by traditional NOE-based NMR methods. Once the results were in (Fig. 1), “this finally made us believers that this was for real because initially I was worried the results were really too good to be true,” says Bax.

Figure 1: CS-Rosetta models obtained in a blinded fashion (red) compared to experimental NMR structures (blue).
figure 1

Copyright 2008 National Academy of Sciences, USA

Bax anticipates that methods like CS-Rosetta and Cheshire will rapidly become accepted in structural genomics because they greatly simplify and shorten the process of structure determination. CS-Rosetta currently is limited, however, to small proteins of about 15 kDa or less. “It appears that the most severe limit is computational; the Rosetta approach really blows up exponentially, and it already takes a humongous amount of computational time,” explains Bax. He also notes that as proteins get larger, their folds become more complex. However, Rosetta allows input of any kind of data, so entering additional structural information such as disulfide bond links or just a few long-range NOEs could simplify analyses for larger proteins.

Bax also expects that future developments in NMR spectroscopy will further the application of this technology. He predicts that, “Over the next ten years, we could develop a much more quantitative relationship between chemical shift and structure, at which point one would be able to get atomic-resolution structures better than what we can get from current technology.”