Main

Most researchers in the mass spectrometry (MS)-based proteomics field take it for granted that at some point, they are going to need to do a database search to match the mass spectra of their peptides with those in a database to identify the peptide sequences and by extension, their parent proteins. But this process becomes difficult and painfully slow for peptides containing multiple post-translational modifications (PTMs). It also becomes pretty much impossible to identify proteins from organisms with unsequenced genomes, for which neither sequence nor spectral databases exist.

So what is a curious researcher to do? If you are Pavel Pevzner, a computer scientist at the University of California, San Diego, you think of something a bit out of the ordinary. Pevzner and his graduate student Nuno Bandeira recently reported a strategy to perform database searching without ever comparing a spectrum to a database (Bandeira et al., 2007a).

Rather than search a database to interpret a peptide mass spectrum, Bandeira, Pevzner and their coworkers developed the concept of spectral networks, using spectral alignment to discover related spectra. For example, two versions of the same peptide, one that contains post-translational modifications, and one without, will have related spectra, as will peptides (born of the same protein) with overlapping sequences. Pevzner explains the concept with an analogy: “Suppose you started from hundreds of spectra that are not related; they're kind of like cities. You are connecting them by roads. And all of a sudden, the spectra make sense because when they are connected by roads, you can use neighbors to interpret what is in every city.”

To illustrate just how powerful the concept is, Bandeira and Pevzner combined efforts with Karl Clauser of the Broad Institute to demonstrate how spectral networks can be used to reconstruct protein sequences from unpurified mixtures of unknown proteins (Bandeira et al., 2007b). They apply a variety of proteases with different specificities to generate peptides with overlapping sequences. They use the spectral network of the overlapping peptide fragments to construct a 'virtual' MS/MS spectrum of very high quality, which can then be used to determine the sequence of the whole protein.

Bandeira and Pevzner investigated the venom of the western diamondback rattlesnake, as an example of a potentially medically important proteome from an organism for which the genome sequence has yet to be determined. Not only did they demonstrate for the first time that de novo protein sequencing from a crude biological mixture was possible, but importantly, “Because venom changes depending on the season of the year that it's collected, and geographical reasons [and so forth], we found single nucleotide polymorphism variants in the sample as well,” says Bandeira.

Though slow and laborious, the present gold standard for protein sequencing is Edman degradation. “Implicitly, we have nothing against Edman degradation, but we feel that with this technique, Edman degradation becomes unnecessary,” says Pevzner. “The number of amino acids we find in a single experiment is in the thousands;...with Edman degradation no one is able to reach anything close.”

Bandeira and Pevzner are confident that their concept of spectral networks will become an important new paradigm in MS-based proteomics, as they have welcomed quite a bit of interest from new collaborators. “While we have demonstrated these methods for mixtures of proteins, these are still somewhat small mixtures of proteins,” says Bandeira. “It will be exciting to see how these tools scale to whole proteomes.”