Introduction
Systems-wide analysis of any biological entity is hard. For proteins it is particularly daunting because proteomics lacks an equivalent to hybridization assays based on Watson-Crick base pairing or amplification reactions based on PCR. Fortunately, we do have the wonderful tool of mass spectrometry (MS) at our disposal. MS is one of the most versatile technologies used in biology, and continuing radical improvements in the technology amaze even the most seasoned observers. However, not all is well in the discipline of proteomics, and much fuzzy thinking and bad data have unfortunately found their way into the literature. Purely technological improvements in proteomics will go a long way to overcome these difficulties1, but they need to be accompanied by rigorous analysis. Three 'analysis' papers published in the last five years in Nature Methods have broken new ground in investigating crucial data quality issues in the proteomics field. One of them deals with protein identification, another with the enrichment of phosphorylated peptides, and the third with the evaluation of proteomics researchers themselves!
A main developmental direction of our discipline is pushing the identification of ever more proteins in complex proteomes, something to which MS is uniquely suited. However, many of the early landmark papers in the last 5–10 years that established the feasibility of large-scale protein identification were obtained on low-resolution instruments and without proper statistical analysis. We now know that a large proportion of the identifications obtained from such projects were in fact false positives. For example, peptide lists contained a large proportion of nontryptic peptides, whereas it is now generally acknowledged that trypsin, at least in proteomics experiments, is extraordinarily sequence-specific2. The recognition of these data quality issues prompted a gradual, though still not complete, switch to high-resolution techniques. It also lent impetus to efforts to standardize the reporting of proteomics protocols and data, and to the development of bioinformatics tools to directly determine the false positive rate independently of the peptide database search score.
Aebersold and co-workers addressed this issue early on by developing an algorithm that decomposed scored distributions into underlying false positive and true positive distributions and assigned likelihoods that peptides with a given database identification score were in fact correctly identified. This development, implemented in their PeptideProphet and ProteinProphet software3, 4, was an important step in bringing some rigor to the identification process in low-resolution data. Even more simple and powerful—and applicable to high- as well as low-resolution data—was the concept of applying reverse sequence databases to determine false positive identification rates. This approach is very straightforward and only involves searching the data against the normal or 'forward' database and against the sequence-reversed database (also called a 'target-decoy' database, if it is concatenated with the forward database5). Once 1% of reverse hits have accumulated, the acceptable threshold score for 99% accuracy is reached, regardless of statistical distribution of errors, workflow and mass spectrometer used, database search score and so on. Although mathematically more sophisticated approaches may in principle be more efficient6, the simplicity and robustness of reverse sequence searching make it easily accessible to members of all proteomics laboratories.
The Gygi group has played an important role in developing the reverse database approach5, and their analysis paper in Nature Methods in 2005 nicely illustrates the power of this approach for comparing different proteomics workflows7. In this work, Gygi and colleagues asked three different questions: is a linear ion trap or a quadrupole time of flight instrument better at identifying peptides? Is the Mascot8 or the Sequest9 database search algorithm superior? And finally, how useful is it to rerun samples to increase peptide identification? Such questions are notoriously difficult to answer in a general and defensible way because results depend on many parameters. However, using the reversed-sequence database strategy to keep the false positive rate fixed at 1%, Gygi and co-workers determined that the two instruments yielded data of comparable quality, but the linear ion trap was much faster. Mascot and Sequest were also roughly similar in performance, but Mascot was better-suited to higher-resolution data, and Sequest was better-suited to low-resolution and noisy data. Peptides identified in repeat measurements contained exceedingly few false positives; false positives were likely to be those peptides only identified in a few runs. This last finding means that identifications in large-scale projects with many repeats and redundancy cannot just be added together on a run-by-run basis; instead, all the data need to be analyzed together to guarantee a uniform false positive rate.
The publication of this paper coincided with the end of the reign of low-resolution ion traps and the quadrupole time of flight instruments in proteomics and just preceded the reign of the hybrid ion trap-Fourier transform instruments, which still continues today. These instruments add high resolution—indispensable for accurate quantification—and very high mass accuracy to the capabilities of the linear ion trap. With these powerful instruments, the majority of fragmented peptides can now be identified using workflows and software that make full use of the new instrumental capabilities10.
Another major application to which mass spectrometry is uniquely suited is the large-scale and accurate analysis of post-translational modifications (PTMs). Low site stoichiometry of PTMs and their dilution in the general pool of peptides necessitate specific enrichment of modified peptides. For phosphorylation, the most commonly studied modification, there are many different protocols for enrichment, all of which exploit the distinct nature of the phospho-group as a 'handle'. For reasons that are not entirely clear, these protocols achieve very different depth of coverage in different laboratories, and, as a result, each group tends to be most successful with their favorite protocol. In a 2007 paper in Nature Methods, Aebersold and co-workers systematically evaluated three of the most popular phosphoprotein enrichment protocols11. They subjected a cytosolic extract of a Drosophila melanogaster cell line to enrichment by phosphoramidate chemistry (PAC), immobilized metal affinity capture (IMAC) or titanium dioxide supplemented with 2,5-dihydroxybenzoic acid (DHB-TiO2) or phtalic acid, and measured each fraction six times with liquid chromatography–tandem MS (LC-MS/MS). The researchers identified less than 1,000 phosphorylation sites, a relatively small fraction of the phosphoproteome, and found that each method contributed distinct phosphopeptides to the total. Thus, although technical replicates were quite reproducible, the different enrichment protocols did not yield highly overlapping results. Each enrichment method appeared to isolate different populations of phosphopeptides, although a common theme for these differences was not apparent. Aebersold and colleagues thus concluded that no single phospho-enrichment strategy can cover the entire phosphoproteome.
Though this work is a good example of the type of analysis that needs to be performed to establish optimal workflows, the results illustrate some of the difficulties in comparing complex proteomics workflows. For example, the authors determined only 156 phosphorylation sites from six runs using the DHB-TiO2 methods, whereas they found three times as many sites with the IMAC method. In contrast, other groups have found DHB-TiO2 to be superior to IMAC. Our group, for example, routinely uses DHB-TiO2 to identify thousands of phosphorylation sites12. In a subsequent publication, Aebersold and colleagues reported identification of tenfold more phosphopeptides from Drosophila using the same combination of the three enrichment protocols13. However, a study using the DHB-TiO2 method alone resulted in the same number of Drosophila phosphorylation sites14. So the jury is still out on whether separate isolation protocols are the strategy of the future or if deeper sampling with one optimized method will go just as far. It would be very interesting to repeat this study using quantitative proteomics approaches: one could use SILAC to label Drosophila cells15, enrich the phosphopeptides by the three different protocols and then directly judge the quantitative differences in phosphopeptide abundance. I would expect that the large majority of phosphopeptides would indeed be present in all three methods, just at different abundances.
The above two examples deal with challenging technical problems in proteomics—one of which is well on its way to being solved and one of which is still in flux. The 2009 analysis by Bergeron and colleagues in Nature Methods deals with a more sociological issue: ignoring for a moment the cutting-edge publications in high-impact journals, what is the actual state of the art among practicing proteomics laboratories? To answer this question they sent a test sample to 27 different groups and asked them to identify the constituent proteins16. Given that entire proteomes can now be comprehensively identified and quantified17, the protein test sample seemed to be absurdly simple: it contained only 20 proteins, all at the same molar concentration. However, this made the results all the more shocking: only members of a few of the laboratories identified all 20 proteins. Several groups missed a number of proteins or reported proteins that were not actually present in the test sample. Bergeron and co-workers charitably attributed these problems mainly to database issues. Indeed, with some 'hand-holding' members of the participating laboratories all identified the 20 proteins in a second round, and a centralized re-analysis of the data revealed that most of the groups had the data in hand but failed in their analysis of it. But this unexpected result shows just how large a gradient exists between the few expert laboratories that publish in high-impact journals and the many laboratories that have to make do with limited resources and expertise. One might expect these results to be caused by outdated equipment but this was not so: experienced researchers with older mass spectrometers performed better than less-experienced researchers with the latest equipment (Fig. 1). Neither are these results likely to be flukes because a similar study by the American Biomolecular Resources Association (ABRF) came to the same general conclusions (ABRF standards group, http://www.abrf.org/ResearchGroups/ProteomicsStandardsResearchGroup/EPosters/
sPRGStudy2006OralPresentation.pdf; 2006). Clearly, the community has its work cut out to bring the majority of proteomics laboratories to the same standard now achieved by the most advanced ones.
Thoughtful and well-performed analysis studies serve a very important function in a technology-driven discipline that is applied by a diverse set of researchers. Such studies are difficult to perform such that the findings are sound and generally applicable, but they provide much-needed guidance in a fast-changing field. They synergize with technological advances and the development of community standards to raise quality standards in proteomics (Fig. 2).
In the future, several areas of proteomics apart from the ones discussed here would be well served by such studies. This includes quantitative proteomics, for which there is a continuing need to benchmark different methods and workflows against each other; targeted proteomics, for which there is a need to evaluate new techniques such as multiple reaction monitoring18, particularly the false positive rates of the technique, not to mention the huge and increasingly important theme of PTM analysis, which includes relative quantification and the determination of the stoichiometry of modification. Furthermore, as proteomics increasingly becomes capable of analyzing in vivo samples, the issue of handling and analyzing very small tissue amounts will come to the fore. Last but not least, the increasingly sophisticated and demanding downstream bioinformatic analysis of the proteomic data will increasingly need quality standards19. All these developments will take place while the proteomic user base will continue to grow exponentially and increasingly include the commercial and medical communities in addition to the academic research community. If the history of the last five years is any guide, Nature Methods and other journals will have a crucial role during the next five years in helping to develop and safeguard quality standards for proteomics.

