Nature Methods
2, 667 - 675 (2005)
Published online: 23 August 2005; | doi:10.1038/nmeth785
Comparative evaluation of mass spectrometry platforms used in large-scale proteomics investigationsJoshua E Elias1, Wilhelm Haas1, Brendan K Faherty2
& Steven P Gygi1, 21 Department of Cell Biology, 240 Longwood Ave., Harvard Medical School, Boston, Massachusetts 02115, USA. 2 Taplin Biological Mass Spectrometry Facility, 240 Longwood Ave., Harvard Medical School, Boston, Massachusetts 02115, USA.
Correspondence should be addressed to Steven P Gygi steven_gygi@hms.harvard.edu Researchers have several options when designing proteomics experiments. Primary among these are choices of experimental method, instrumentation and spectral interpretation software. To evaluate these choices on a proteome scale, we compared triplicate measurements of the yeast proteome by liquid chromatography tandem mass spectrometry (LC-MS/MS) using linear ion trap (LTQ) and hybrid quadrupole time-of-flight (QqTOF; QSTAR) mass spectrometers. Acquired MS/MS spectra were interpreted with Mascot and SEQUEST algorithms with and without the requirement that all returned peptides be tryptic. Using a composite target decoy database strategy, we selected scoring criteria yielding 1% estimated false positive identifications at maximum sensitivity for all data sets, allowing reasonable comparisons between them. These comparisons indicate that Mascot and SEQUEST yield similar results for LTQ-acquired spectra but less so for QSTAR spectra. Furthermore, low reproducibility between replicate data acquisitions made on one or both instrument platforms can be exploited to increase sensitivity and confidence in large-scale protein identifications.Increasingly, proteome-scale experiments using mass spectrometry are being used as biological assays1,
2, with many studies identifying proteins numbering in the thousands3,
4,
5. More than generating simple catalogs of cellular components, such exploratory surveys establish the foundations for constructing protein interaction networks, and indicate signaling pathways involved in pathological and developmental processes. Experiments that maximize confident protein identifications using available instrumentation and computation resources are therefore desirable. We present data here that will help guide researchers' choice of experiment design with regard to two widely used mass spectrometers (LTQ and QSTAR) and MS/MS spectra interpretation software (Mascot and SEQUEST).
Two commonly used types of tandem mass spectrometers are those with QqTOF configurations such as the QSTAR, and quadrupole ion trap (QIT) arrangements like the LTQ. Fundamental differences between these instruments affect the quality of mass measurements, including accuracy, resolution and dynamic range6,
7,
8. The LTQ, a recently commercialized linear (two-dimensional) QIT mass spectrometer (2D-QIT), has higher ion capacity and scan rates than traditional three-dimensional ion traps9,
10properties that may increase sensitivity relative to the QSTAR despite higher mass accuracy and resolution provided by the latter mass spectrometer. Although both instruments are widely used by the mass spectrometry community, a thorough and detailed comparison of their performances on complex peptide mixtures has not been done.
The most widely used search algorithms for interpreting tandem mass spectra are Mascot and SEQUEST11,
12,
13. Traditionally, ion trap−acquired MS/MS spectra are interpreted with SEQUEST, whereas Mascot is used to sequence TOF spectra (for example, see refs. 3,14). Based on at least one study15, it would appear that the two algorithms are fairly comparable, at least for ion trap−acquired MS/MS spectra. It has remained unclear whether this holds true for TOF-acquired spectra as well.
These two algorithms apply similar general approaches to assign peptides in a sequence database to measured MS/MS spectra13. But Mascot and SEQUEST use fundamentally different principles in their mathematical operations. Though not explicitly described, Mascot uses a probabilistic metric to assess the likelihood of a fragmented peptide to have given rise to an observed spectrum, whereas SEQUEST uses empirical and correlation measurements to score the alignment between observed and predicted spectra12. Furthermore, Mascot suggests a 'homology' scoring threshold that is similar to a described measurement that uses the distribution of scores returned for peptides matching the observed precursor mass16 (J. Cottrell, personal communication). Accordingly, SEQUEST reports the normalized difference between the score of the top-ranked peptide and the scores of the remaining peptide hits. The magnitude of this difference correlates with high-confidence peptide matches12,
17,
18. These dissimilarities make it seem unlikely that these two spectrum interpretation tools should be equivalent.
Researchers might consider using complementary analytical platforms to increase peptide and protein identification. This, however, may be insufficient if platform reproducibility is low. As shotgun sequencing approaches usually sample only a fraction of a complex peptide mixture, one might expect to identify different peptide subsets across replicate analyses. We must therefore ask, does one gain more information by using alternative analytical platforms or simply by analyzing a sample multiple times?
To address issues regarding choices of instrument, algorithm and experiment design, we performed a large-scale analysis of yeast whole-cell lysate with LTQ and QSTAR mass spectrometers. The resulting tandem mass spectra were interpreted with Mascot and SEQUEST algorithms with ('tryptic search') and without ('nonspecific search') the requirement that all matching peptides have two tryptic termini ('tryptic'), an indication of specific digestion by the protease trypsin. Using a composite 'target-decoy' database search strategy18,
19,
20, we effectively estimated the error rates of applied score filter criteria, allowing comparisons of multiple data sets with similar error rates. Based on these comparisons, we reached three primary conclusions: (i) Mascot and SEQUEST results greatly overlapped (>85%) for LTQ-acquired spectra that satisfy filter criteria, but Mascot appeared better-suited to interpreting QSTAR MS/MS spectra; (ii) results from replicate analyses overlapped less ( 70%), suggesting more peptide and protein identifications can be confidently made by analyzing samples multiple times; and (iii) peptides identified by LTQ or QSTAR mass spectrometers overlapped even less (50−60%), indicating complementarity between the two systems.
Results Dataset standardization We analyzed five trypsin-digested gel regions representative of the yeast proteome in triplicate by nanoscale microcapillary LC-MS/MS using LTQ (2D-QIT) and QSTAR (QqTOF) mass spectrometers (Fig. 1a). These instruments were optimized such that each would have similar numbers of opportunities to sequence eluting peptides (Fig. 1a and Supplementary Methods online) despite their different acquisition rates (Table 1 and Fig. 1b). Resulting MS/MS spectra were searched with Mascot and SEQUEST algorithms using commonly used parameters (Table 2), including both tryptic and nonspecific search modes (Fig. 1a). After nonspecific searches, we discarded all nontryptic peptide matches, as the vast majority of these are incorrect17,
20.
 | |  |
 | |  | Searches against a composite target-decoy database containing all yeast protein sequences in both forward and reverse orientations18,
20 provided a simple and effective way to estimate the false positive rate of peptide-spectral matches (PSMs; Supplementary Discussion online). Estimated algorithm false positive rates were calculated by doubling the number of decoy hits and dividing this by the total number of hits. This composite database strategy has two primary advantages over other proposed error estimation methods17,
19. First, a correct top-ranked peptide can be confidently selected even in the presence of a decoy hit with a slightly loweryet still largescore. Second, this method removes the requirement for exact a priori knowledge of mixture composition. Most importantly, this estimation method is instrument- and search algorithm−independent, a prerequisite for meaningful comparisons.
This searching strategy was applied to each instrument platform and each search algorithm (Fig. 1). We then derived and applied selection criteria insensitive to dataset redundancies to generate lists of unique charged peptides with 1% false positive identifications (99% precision) while maximizing the number of selected PSMs (Supplementary Table 1 online). Application of both a primary score (ion score and XCorr) and a relative score (the homology factor and Cn) was necessary to achieve an acceptable error rate at near-maximum sensitivity for both Mascot and SEQUEST (Supplementary Fig. 1 online). The 1% false positive threshold represented a suitable balance between sensitivity and precision, as we have shown previously18. Universal application of this single criterion permitted rigorous comparisons between search algorithms, replicate analyses and analyses on multiple instruments.
Do Mascot and SEQUEST identify the same peptides? Search algorithm differences and the disparate appearance of MS/MS spectra collected on the LTQ versus the QSTAR (Supplementary Fig. 2 online) suggest Mascot and SEQUEST might identify different subsets of peptides within our 1% error tolerance. We determined that the algorithms similarly interpreted LTQ-acquired MS/MS spectra, but the scoring function of Mascot was better-suited than SEQUEST for discriminating between correctly and incorrectly interpreted MS/MS spectra collected on the QSTAR (Table 3 and Fig. 2).
 | | Figure 2. Mascot and SEQUEST are complementary analysis tools for interpreting LTQ and QSTAR MS/MS. |  |  |  | (a) Comparison of the average number of confidently assigned spectra for each instrument, algorithm and search type combination. Error bars represent the maximum and minimum values for three replicate analyses. (b) Venn diagrams showing the degree of overlap between Mascot and SEQUEST searches. Numbers in parentheses indicate the precision represented by a particular region. Numbers in square brackets indicate the percentage of either search result that lies outside the overlap region. Nonspecific SEQUEST searches and tryptic Mascot searches were combined for further comparisons: for LTQ searches, 666 exclusive Mascot identifications, 644 exclusive SEQUEST identifications and 4,056 joint identifications were combined yielding an average of 5,386 MS/MS spectra. For QSTAR searches, 1,012 exclusive Mascot identifications, 510 exclusive SEQUEST identifications and 1,955 joint identifications were combined, yielding an average of 3,477 MS/MS spectra. Mascot and SEQUEST results are depicted in red and blue, respectively.
Full Figure and legend (81K) |
|  |
 | | Table 3. Average numbers of interpreted MS/MS spectra identified by Mascot and SEQUEST from triplicate analyses by LTQ and QSTAR |  |  |  |
Full Table |
|  | When nonspecific SEQUEST searches were compared with tryptic Mascot searches, similar numbers of LTQ MS/MS spectra were confidently identified, differing by just 0.5% (Table 3 and Fig. 2b). The overlap between these identifications was substantial and noticeably greater than the proportion of selected PSMs in common between Mascot and SEQUEST results when just one search method (tryptic or nonspecific) was used by both algorithms (Fig. 2b). As previously observed15, PSMs confidently identified by both algorithms represented a subpopulation estimated to be almost entirely devoid of false matches. Furthermore, we rarely (0.16%) observed spectra for which both algorithms scored PSMs at or above our selection criteria, yet did not agree on the peptide sequence that gave rise to the MS/MS spectrum (excluding simple isobaric residue substitutions).
An average of 5,386 MS/MS spectra were confidently assigned by either algorithm for the triplicate analyses of five gel regionsa 14% improvement over the 4,700 identified by a traditional SEQUEST search of LTQ MS/MS spectra. The majority of all collected spectra, however, were not correctly identified. Of an average of 15,992 spectra acquired per set of five gel regions, only 34% yielded high-confidence PSMs.
While Mascot and SEQUEST algorithms confidently interpreted similar numbers of LTQ-acquired MS/MS spectra, Mascot scoring was better-suited to identifying correctly interpreted QSTAR spectra. Of the average 3,477 confidently assigned QSTAR MS/MS spectra, Mascot exclusively selected nearly one-third, in comparison to the 15% selected just by SEQUEST (Fig. 2b). This is not to say SEQUEST assignments that did not meet our criteria were incorrect: 91% of the PSMs that passed Mascot criteria were also identified by SEQUEST within the top ten matches, but only 72% received XCorr and Cn scores sufficient to distinguish them from incorrect matches. In comparison, 87% of the LTQ-acquired spectra identically interpreted by both Mascot and SEQUEST were given sufficient scores to pass the selection criteria for both algorithms (Table 3). As with the LTQ-acquired spectra, PSMs surpassing both Mascot and SEQUEST criteria were highly enriched for correct matches, with only 0.17% sequence disagreement for spectra that pass selection criteria for both algorithms.
SEQUEST increased the total number of selected PSMs by an average of 17% when performed in addition to the traditional Mascot search of QSTAR-acquired MS/MS spectra. As with LTQ searches, most MS/MS spectra were not selected as correct PSMs. Of an average of 15,309 spectra acquired in three replicate runs, only 23% yielded high-confidence PSMs.
Differences between Mascot and SEQUEST For both LTQ and QSTAR runs, the populations of peptides confidently selected by just one algorithm appeared to be indistinguishable from the peptides selected by the other, based on several measurements including charge state, residue composition and peptide length (data not shown). MS/MS spectrum quality presents a possible explanation for Mascot and SEQUEST performance differences. We observed that the apparent complexity and signal-to-noise ratios of MS/MS spectra have profound and distinct effects on the magnitude of scores assigned by Mascot and SEQUEST (Supplementary Data online and Supplementary Fig. 2).
Trypsin We occasionally identify peptides derived from nontryptic pathways during single-protein analyses in which we can detect these lower kinetic events (data not shown). When complex mixtures are examined, however, the overwhelming majority of confidently identified peptides have tryptic ends. One might expect that by requiring a priori for peptide hits to have expected features of correct matches, incorrect matches would necessarily be removed. Whereas this certainly holds true in many cases, it cannot be considered a rule. Programs like SEQUEST and Mascot find peptide matches to MS/MS spectra even when appropriate matches are absent from the sequence database. As nontryptic sequences greatly outnumber tryptic peptides considered by search algorithms under nonspecific search conditions, it is far more likely that nontryptic sequences will be incorrectly assigned to MS/MS spectra when correct sequences are not available for consideration, or when spectrum quality is insufficient to generate confident matches. Consequently, the overwhelming majority of non−fully tryptic PSMs are incorrect.
Regardless of instrument or search algorithm, high-confidence PSMs were enriched in a score-independent fashion by searching nonspecifically and then restricting passing PSMs to be tryptic. For SEQUEST searches, more high-confidence PSMs were selected in this way than by initially requiring all considered peptides to be tryptic. With tryptic searches, however, one must primarily rely on score filter criteria to distinguish correct from incorrect matches. As the score distribution of this larger incorrect PSM population overlaps more extensively with the high-confidence PSM population (Supplementary Fig. 3 online), one must apply elevated filter criteria to maintain acceptable error rates (Supplementary Table 1). Higher criteria necessarily exclude many 'correct' PSMs, thereby lowering sensitivity. We observed 14−16% fewer confidently assigned MS/MS by tryptic than nonspecific SEQUEST searches. This represents a much larger decrease in sensitivity than any observed as a result of incorrect, nontryptic peptides receiving higher scores than correct peptides (1−3% of all passing tryptic PSMs, except 7% of PSMs from SEQUEST searches of QSTAR MS/MS spectra).
The benefit of nonspecific searching is less pronounced for Mascot searches. Unlike SEQUEST, Mascot supplies a probability-based threshold score to determine if a PSM can be confidently selected given the score distribution of peptides that could conceivably have given rise to an observed spectrum16. We found this scoring feature to be sufficient for confident PSM selection without further enhancement by nonspecific searching. We acknowledge that probabilistic measurements and evaluations have also been described for SEQUEST results17,
21, and that these can add greater confidence to matches. Comparative evaluations of their efficacies, however, are beyond the scope of this work. Probabilistic modeling and spectrum quality alone might not account for the larger disparity between Mascot and SEQUEST search results of QSTAR-acquired spectra, however. Unlike the SEQUEST revision used here, Mascot accounts for the higher mass accuracy (<50 p.p.m.) achievable on the QqTOF, potentially further contributing to this discrepancy.
The greatest average numbers of spectra identified by Mascot were selected from tryptic searches (4,722, LTQ; 2,967, QSTAR). The greatest average numbers of spectra identified by SEQUEST were selected from nonspecific searches (4,700, LTQ; 2,465, QSTAR). By combining results from the two searches, more PSMs could be selected, although we observed a disproportionate increase in estimated false positive identifications. Given that the majority of decoy database hits were selected from just one algorithm's results (Fig. 2) and contained internal tryptic cleavage sites (data not shown), we removed these low confidence 'missed cleavage' peptides from the sets of nonoverlapping PSMs. Doing so restored the estimated precision rate of unique identifications to 99%. Search results combined in this way were used as the basis for further comparisons.
Does it matter if a sample is analyzed more than once? It has been previously noted that subsequent identification by repeated analyses of a sample confers high confidence in peptide and protein identifications15 and may provide clues to protein abundance22. Although it is clear that they can increase the number of protein identifications22, it has remained unclear if nonoverlap identifications are primarily incorrect, or if they enhance protein coverage.
Of the 5,284 (LTQ) or 4,357 (QSTAR) nonredundant PSMs confidently identified in the sum of three replicate analyses, only 48% and 35% were selected from all three replicate runs, with any two analyses overlapping by averages of 76% and 67%, respectively (Fig. 3a). Clearly, repeated analyses of a single sample greatly enhanced the number of peptides identified from that sample (Fig. 3b). The 23%−38% increase in confidently identified peptides achieved by analyzing samples twice versus once, and 37%−60% increase from three replicate analyses versus one considerably eclipses gains achieved through analysis by both Mascot and SEQUEST.
 | |  | As expected, more identified peptides translated to more identified proteins. At the protein level, however, the percent increase from repeated analyses did not parallel the gains observed at the peptide level. For example, we confidently identified an average of 33% and 52% more peptides by reanalyzing a sample once or twice on the QSTAR, but only recorded 19% and 30% more proteins. This reflects the observations that neither the number nor the identities of peptides that identify a given protein are necessarily consistent from run to run. The reduction in the number of proteins identified relative to the number of peptides appears consistent for both instruments, suggesting that under the conditions used, the two shotgun approaches yield similar surveys of complex protein mixtures. We note that an average of 24% of protein identifications not validated by any other replicate runs were estimated to be incorrect, in comparison to the <2% estimated to be incorrect when proteins were found in multiple replicate analyses.
Do the LTQ and QSTAR yield the same identifications? Despite the fact the LTQ collected just 4% more MS/MS spectra than the QSTAR (in one-third the acquisition time), 21% more unique PSMs were confidently identified with the LTQ, and the numbers of proteins identified from each instrument were essentially the same (Fig. 3). It seems reasonable to suspect strong agreement on these identifications across platforms as well. We found that this was not the case: less than half of all unique peptide sequences identified in any replicate by either instrument that passed our selection criteria were identified by both instruments (Fig. 4a), regardless of charge. This disparity was diminished somewhat at the protein level, with approximately 60% of the proteins identified by both LTQ and QSTAR mass spectrometers. We attribute this increase in protein overlap to the observation that several peptides identified on just one instrument are derived from proteins identified on both.
 | |  | We observed roughly similar numbers of acquired MS/MS spectra and confidently identified peptides and proteins for the LTQ and QSTAR. But if the gradient used on the LTQ was extended to match that used on the QSTAR, the number of peptide and protein identifications would increase proportionally (Fig. 5a). This is the effect we observed when a portion of our sample was reacquired in triplicate with a 90-minute gradient on the LTQ (Supplementary Fig. 4 online).
For both peptides and proteins, identification by both LTQ and QSTAR mass spectrometers gave near complete assurance of correct identifications, given peptides that passed the stated selection criteria. Conversely, false positive identifications were concentrated in the nonoverlap regions, particularly when considering proteins. Examination of these nonoverlap proteins revealed that many were identified by just one peptide, a symptom of both low abundance as well as incorrect peptide identification (Supplementary Fig. 5 online).
Comparing the set of proteins identified in any replicate by either instrument to a single analysis on one instrument, we observed that the total number of protein identifications increased by 60%. The number of false positive protein identifications, however, increased nearly fivefold. Because these false identifications were overwhelmingly restricted to the nonoverlapping regions of the Venn diagram, adding the further constraint of being identified in multiple analyses is an effective strategy to enrich for correctly identified proteins. When this strategy was applied to our dataset, 826 proteins were selected as being correctly identified with an estimated false positive rate near 0.0%.
We found 562 peptides identified in at least two replicate QSTAR analyses that were never confidently identified by the LTQ and 1,121 peptides identified in at least two replicate LTQ analyses and never confidently identified by the QSTAR. We estimated that >99% of these identifications were correct. Although we found no obvious sequence-related reason why one instrument may show preference for one set of peptides (data not shown), we observed that exclusively LTQ-identified peptides were on average twice the length of QSTAR-identified peptides (Fig. 4b). As both Mascot and SEQUEST were used to generate these lists of identified peptides, we conclude that fundamental differences between the instruments led to differences in the range of identifiable peptide lengths, rather than issues related to spectrum interpretation. The distinct distributions of ions selected for MS/MS by the LTQ and QSTAR correlate with the lengths of identified peptides (Supplementary Fig. 6 online). This suggests the instruments' inherent ion preferences and acquisition ranges influence their abilities to sequence long peptides (Supplementary Methods).
This observed length discrepancy accounts for just half of the peptides uniquely identified by the LTQ, and does not explain the peptide identifications made exclusively by the QSTAR. Because the proportion of confidently-assigned MS/MS spectra was nearly 50% more for LTQ- than QSTAR-acquired data (0.32 versus 0.21), we conclude that either the QSTAR was able to fragment and measure more peptides not represented in the sequence database, or the search algorithms had more difficulty correctly interpreting and assigning distinguishable scores to QSTAR MS/MS spectra. The latter situation appears most likely because LTQ-acquired spectra tend to receive greater scores than QSTAR spectra (Supplementary Fig. 1). Considering only the peptides with lengths less than 17 residues, we found that average scores assigned to LTQ PSMs exceeded QSTAR PSM scores by 17% (nonspecific SEQUEST search) to 100% (tryptic Mascot search). Moreover, the decoy (incorrect) LTQ PSMs were confined more toward the lower-scoring ranges than the QSTAR PSMs. Finally, a greater proportion of QSTAR-acquired spectra have qualities with which both Mascot and SEQUEST have difficulty (Supplementary Fig. 3).
Discussion How mass spectrometry can best be used to exhaustively analyze protein samples has remained a logistical and computational challenge. In this report, we demonstrate that both protein and proteome coverage can be dramatically improved (i) when samples are analyzed multiple times; (ii) when samples are measured by complementary instruments, and to a lesser extent, (iii) when resulting MS/MS spectra are interpreted with complementary search algorithms (Fig. 5b). Furthermore, repeated identification by complementary systems presents a scoring-independent means by which correct peptide and protein identifications can be selected.
These factors should be particularly effective for validating dubious protein identifications. For example, researchers often restrict their confident protein identifications to those identified by two or more peptides23, as proteins identified by single peptides exhibit higher false positive rates. In support of this practice, we estimated that all false positive protein identifications were proteins identified by one peptide for both LTQ and QSTAR analyses. Although removal of this peptide class as an additional filtering step would likely bring the estimated dataset precision to near 100%, doing so would remove more than one third of all protein identifications (per replicate analysis), 85−95% of which were estimated to be correct (Supplementary Fig. 5). The validation techniques used here as well as applying more restrictive score criteria should prove useful in rescuing these potentially valuable identifications.
We present a direct comparison of ion trap and TOF instrumentation platforms for the analysis of complex protein mixtures. Several other mass spectrometer configurations have been demonstrated to have utility in defining proteomes24,
25,
26,
27. Often limiting amounts of biological material, available instruments or computation capabilities place constraints on which approaches can be reasonably followed to achieve the most sensitive and accurate proteome measurements. We believe a thorough understanding of the relative strengths of these platforms will prove invaluable to the proteome research community; when applied to these other platforms, the strategies explained here will allow researchers to make more rational choices regarding which analytical option will best suit their particular experimental goals. Similarly, many more spectral interpretation28,
29,
30,
31 and selection17,
18,
19,
32 options have been described than were used here. With so many data interpretation options, it is crucial that they be benchmarked for various mass spectrometer configurations. We have made available all raw data used here for this purpose (http://gygi.med.harvard.edu/pubs).
Methods Sample processing. Log-phase Saccharomyces cerevisiae cultures were lysed as described33. Lysate protein concentration was determined using a bicinchoninic acid (BCA) protein assay (Pierce). Roughly 4 mg of protein were reduced, alkylated with iodoacetamide, and separated by SDS-PAGE as described34. The resulting gel was divided into ten equal gel slices (Fig. 1). Five alternating 0.5-cm 10-cm slices were subjected to in-gel tryptic digestion35. Following off-line desalting on C18 solid-phase extraction cartridges, 4% of each digest were subjected to analysis by LC-MS/MS on either LTQ (ThermoElectron) or QSTAR-XL (Applied Biosystems/MDS Sciex) mass spectrometers. Detailed description of the conditions used for LC-MS/MS analysis is available in Supplementary Methods online.
Database searching and data processing. All data were converted from raw instrument output to the .dta format using instrument-supplied software: Analyst QS, Build 7051 (MDS SCIEX) for QSTAR-acquired MS/MS spectra or the program ExtractMS version 2.11 (ThermoElectron) for LTQ-acquired MS/MS spectra. The program MascotMerge (Matrix Science) was used to convert .dta files into the Mascot Generic Format prior to launching Mascot searches. Mascot (version 2.0, Matrix Science) searches were performed using a dual 1.1 Ghz processor server at the Harvard Medical School Pathology Functional Proteomics Center (http://pfpc.med.harvard.edu). SEQUEST (version 27, revision 9) searches were performed against the same sequence database using an in-house Linux cluster with fourteen 2.2-Ghz dual processor nodes. The protein sequence database used consisted of all translated open reading frame sequences (orf_trans.fasta from Saccharomyces Genome Database (SGD) at Stanford University; downloaded 10 September, 2004 (ftp.yeastgenome.org/yeast/)) in the forward (target) orientation preceding these same sequences in their reversed (decoy) orientations.
All cysteine residues were searched as carboxamidomethyl-cysteine (+57.0215 Da), and methionine residues were allowed to be oxidized (+15.9949 Da). Up to two internal cleavages sites were allowed for tryptic searches. Parameters commonly set for all LTQ searches included use of average atomic masses, and a tolerance of 2.0 Da for precursor ions and 0.8 Da for fragment ions (Mascot). Parameters commonly set for all QSTAR searches included the use of monoisotopic atomic masses and a tolerance of 0.2 Da for both precursor and fragment ions.
Specific score cutoffs were empirically determined for each replicate run by varying the scores listed in Supplementary Table 1 to maximize the number of accepted nonredundant PSMs, keeping the precision rate as close to 99% as possible. The cutoffs determined for each run were averaged, and applied to all three data sets. Scripts written in the Perl programming language were used to import, export and compile search results to and from a Postgres SQL database. Further data manipulations were performed with Microsoft Excel and Sigma Plot (Systat Software, Inc.).
Note: Supplementary information is available on the Nature Methods website.
Received 6 April 2005; Accepted 26 July 2005; Published online: 23 August 2005.
REFERENCES
-
Aebersold, R. & Mann, M. Mass spectrometry-based proteomics. Nature 422, 198−207 (2003). | Article | PubMed | ISI | ChemPort |
-
Aebersold, R. & Goodlett, D.R. Mass spectrometry in proteomics. Chem. Rev. 101, 269−95 (2001). | Article | PubMed | ISI | ChemPort |
-
Florens, L. et al. A proteomic view of the Plasmodium falciparum life cycle. Nature 419, 520−526 (2002). | Article | PubMed | ISI | ChemPort |
-
Peng, J. et al. A proteomics approach to understanding protein ubiquitination. Nat. Biotechnol. 21, 921−926 (2003). | Article | PubMed | ISI | ChemPort |
-
Foster, L.J., De Hoog, C.L. & Mann, M. Unbiased quantitative proteomics of lipid rafts reveals high specificity for signaling factors. Proc. Natl. Acad. Sci. USA 100, 5813−5818 (2003). | Article | PubMed | ChemPort |
-
Louris, J. et al. Instrumentation, applications, and energy deposition in quadrupole ion-trap tandem mass spectrometry. Anal. Chem. 59, 1677−1685 (1987). | Article | ChemPort |
-
Jonscher, K.R. & Yates, J.R., III. The quadrupole ion trap mass spectrometer−a small solution to a big challenge. Anal. Biochem. 244, 1−15 (1997). | Article | PubMed | ChemPort |
-
Chernushevich, I.V., Loboda, A.V. & Thomson, B.A. An introduction to quadrupole-time-of-flight mass spectrometry. J. Mass Spectrom. 36, 849−865 (2001). | Article | PubMed | ISI | ChemPort |
-
Schwartz, J.C., Senko, M.W. & Syka, J.A. Two-dimensional quadurpole ion trap mass spectrometer. J. Am. Soc. Mass Spectrom. 13, 659−669 (2002). | Article | PubMed | ISI | ChemPort |
-
Mayya, V., Rezaul, K., Cong, Y.S. & Han, D. Systematic comparison of a two-dimensional ion trap and a three-dimensional ion trap mass spectrometer in proteomics. Mol. Cell. Proteomics 4, 214−223 (2005). | PubMed | ChemPort |
-
Perkins, D.N., Pappin, D.J., Creasy, D.M. & Cottrell, J.S. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551−3567 (1999). | Article | PubMed | ISI | ChemPort |
-
Eng, J.K., McCormack, A.L. & Yates, J.R., III. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 5, 976−989 (1994). | Article | ISI | ChemPort |
-
Sadygov, R.G., Cociorva, D. & Yates, J.R., III. Large-scale database searching using tandem mass spectra: Looking up the answer in the back of the book. Nat. Methods 1, 195−202 (2004). | PubMed | ChemPort |
-
Lasonder, E. et al. Analysis of the Plasmodium falciparum proteome by high-accuracy mass spectrometry. Nature 419, 537−542 (2002). | Article | PubMed | ISI | ChemPort |
-
Resing, K.A. et al. Improving reproducibility and sensitivity in identifying human proteins by shotgun proteomics. Anal. Chem. 76, 3556−3568 (2004). | Article | PubMed | ChemPort |
-
Fenyo, D. & Beavis, R.C. A method for assessing the statistical significance of mass spectrometry−based protein identifications using general scoring schemes. Anal. Chem. 75, 768−774 (2003). | Article | PubMed | ISI | ChemPort |
-
Keller, A., Nesvizhskii, A.I., Kolker, E. & Aebersold, R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 74, 5383−5392 (2002). | Article | PubMed | ISI | ChemPort |
-
Elias, J.E., Gibbons, F.D., King, O.D., Roth, F.P. & Gygi, S.P. Intensity-based protein identification by machine learning from a library of tandem mass spectra. Nat. Biotechnol. 22, 214−219 (2004). | Article | PubMed | ChemPort |
-
Moore, R.E., Young, M.K. & Lee, T.D. Qscore: an algorithm for evaluating SEQUEST database search results. J. Am. Soc. Mass Spectrom. 13, 378−386 (2002). | Article | PubMed | ChemPort |
-
Peng, J., Elias, J.E., Thoreen, C.C., Licklider, L.J. & Gygi, S.P. Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: the yeast proteome. J. Proteome Res. 2, 43−50 (2003). | Article | PubMed | ISI | ChemPort |
-
Sadygov, R.G. & Yates, J.R., III. A hypergeometric probability model for protein identification and validation using tandem mass spectral data and protein sequence databases. Anal. Chem. 75, 3792−3798 (2003). | Article | PubMed | ChemPort |
-
Liu, H., Sadygov, R.G. & Yates, J.R., III. A model for random sampling and estimation of relative protein abundance in shotgun proteomics. Anal. Chem. 76, 4193−4201 (2004). | Article | PubMed | ChemPort |
-
Kratchmarova, I., Blagoev, B., Haack-Sorensen, M., Kassem, M. & Mann, M. Mechanism of divergent growth factor effects in mesenchymal stem cell differentiation. Science 308, 1472−1477 (2005). | Article | PubMed | ChemPort |
-
Medzihradszky, K.F. et al. The characteristics of peptide collision-induced dissociation using a high-performance MALDI-TOF/TOF tandem mass spectrometer. Anal. Chem. 72, 552−558 (2000). | Article | PubMed | ISI | ChemPort |
-
Hager, J.W. A new linear ion trap mass spectrometer. Rapid Comm. Mass Spec. 16, 512−526 (2002). | ChemPort |
-
Lipton, M.S. et al. Global analysis of the Deinococcus radiodurans proteome by using accurate mass tags. Proc. Natl. Acad. Sci. USA 99, 11049−11054 (2002). | Article | PubMed | ChemPort |
-
Meng, F. et al. Molecular-level description of proteins from Saccharomyces cerevisiae using quadrupole FT hybrid mass spectrometry for top down proteomics. Anal. Chem. 76, 2852−2858 (2004). | Article | PubMed | ChemPort |
-
Zhang, N., Aebersold, R. & Schwikowski, B. ProbID: a probabilistic algorithm to identify peptides through sequence database searching using tandem mass spectral data. Proteomics 2, 1406−1412 (2002). | Article | PubMed | ISI | ChemPort |
-
Tabb, D.L., Saraf, A. & Yates, J.R., III. GutenTag: high-throughput sequence tagging via an empirically derived fragmentation model. Anal. Chem. 75, 6415−6421 (2003). | Article | PubMed | ChemPort |
-
LeDuc, R.D. et al. ProSight PTM: an integrated environment for protein identification and characterization by top-down mass spectrometry. Nucleic Acids Res. 32, W340−W345 (2004). | PubMed | ChemPort |
-
Chamrad, D.C. et al. Evaluation of algorithms for protein identification from sequence databases using mass spectrometry data. Proteomics 4, 619−628 (2004). | Article | PubMed | ChemPort |
-
Nesvizhskii, A.I., Keller, A., Kolker, E. & Aebersold, R. A statistical model for identifying proteins by tandem mass spectrometry. Anal. Chem. 75, 4646−4658 (2003). | Article | PubMed | ISI | ChemPort |
-
Verdel, A. & Moazed, D. Labeling and characterization of small RNAs associated with the RNA interference effector complex RITS. Methods Enzymol. 392, 297−307 (2005). | PubMed | ChemPort |
-
Beausoleil, S.A. et al. Large-scale characterization of HeLa cell nuclear phosphoproteins. Proc. Natl. Acad. Sci. USA 101, 12130−12135 (2004). | Article | PubMed | ChemPort |
-
Peng, J. & Gygi, S.P. Proteomics: the move to mixtures. J. Mass Spectrom. 36, 1083−1091 (2001). | Article | PubMed | ISI | ChemPort |
Acknowledgments This work was supported in part by US National Institutes of Health GM67945 and HG00041 (S.P.G.). We thank D. Moazed for yeast lysate and the Pathology Functional Proteomic Center at Harvard Medical School for allowing use of their Mascot server.
Competing interests statement:
The authors declare competing financial interests.
|