Perspective | Published:

Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry

Nature Methods volume 4, pages 207214 (2007) | Download Citation

Subjects

Abstract

Liquid chromatography and tandem mass spectrometry (LC-MS/MS) has become the preferred method for conducting large-scale surveys of proteomes. Automated interpretation of tandem mass spectrometry (MS/MS) spectra can be problematic, however, for a variety of reasons. As most sequence search engines return results even for 'unmatchable' spectra, proteome researchers must devise ways to distinguish correct from incorrect peptide identifications. The target-decoy search strategy represents a straightforward and effective way to manage this effort. Despite the apparent simplicity of this method, some controversy surrounds its successful application. Here we clarify our preferred methodology by addressing four issues based on observed decoy hit frequencies: (i) the major assumptions made with this database search strategy are reasonable; (ii) concatenated target-decoy database searches are preferable to separate target and decoy database searches; (iii) the theoretical error associated with target-decoy false positive (FP) rate measurements can be estimated; and (iv) alternate methods for constructing decoy databases are similarly effective once certain considerations are taken into account.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    , , & Comparative evaluation of mass spectrometry platforms used in large-scale proteomics investigations. Nat. Methods 2, 667–675 (2005).

  2. 2.

    , , & Integrated approach for manual evaluation of peptides identified by searching protein sequence databases with tandem mass spectra. J. Proteome Res. 4, 998–1005 (2005).

  3. 3.

    , & An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 5, 976–989 (1994).

  4. 4.

    , & Qscore: An Algorithm for Evaluating SEQUEST Database Search Results. J. Am. Soc. Mass Spectrom. 13, 378–386 (2002).

  5. 5.

    , , , & Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: the yeast proteome. J. Proteome Res. 2, 43–50 (2003).

  6. 6.

    et al. PRISM, a generic large scale proteomic investigation strategy for mammals. Mol. Cell. Proteomics 2, 96–106 (2003).

  7. 7.

    et al. Optimization and use of peptide mass measurement accuracy in shotgun proteomics. Mol. Cell Proteomics 7, 1326–1337 (2006).

  8. 8.

    , , & Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551–3567 (1999).

  9. 9.

    , & Trypsin cleaves exclusively C-terminal to arginine and lysine residues. Mol. Cell. Proteomics 3, 608–614 (2004).

  10. 10.

    , & Improving protein identification using complementary fragmentation techniques in fourier transform mass spectrometry. Mol. Cell. Proteomics 4, 835–845 (2005).

  11. 11.

    et al. Improving reproducibility and sensitivity in identifying human proteins by shotgun proteomics. Anal. Chem. 76, 3556–3568 (2004).

  12. 12.

    et al. Probability-based evaluation of peptide and protein identifications from tandem mass spectrometry and SEQUEST analysis: the human proteome. J. Proteome Res. 4, 53–62 (2005).

  13. 13.

    , , & Randomized sequence databases for tandem mass spectrometry peptide and protein identification. OMICS 9, 364–379 (2005).

  14. 14.

    et al. Large-scale characterization of HeLa cell nuclear phosphoproteins. Proc. Natl. Acad. Sci. USA 101, 12130–12135 (2004).

  15. 15.

    , , , & Intensity-based protein identification by machine learning from a library of tandem mass spectra. Nat. Biotechnol. 22, 214–219 (2004).

  16. 16.

    , , & Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 74, 5383–5392 (2002).

  17. 17.

    & A hypergeometric probability model for protein identification and validation using tandem mass spectral data and protein sequence databases. Anal. Chem. 75, 3792–3798 (2003).

  18. 18.

    & Interpretation of shotgun proteomic data: the protein inference problem. Mol. Cell. Proteomics 4, 1419–1440 (2005).

  19. 19.

    , , , & A probability-based approach for high-throughput protein phosphorylation analysis and site localization. Nat. Biotechnol. 24, 1285–1292 (2006).

  20. 20.

    et al. Enhanced analysis of metastatic prostate cancer using stable isotopes and high mass accuracy instrumentation. J. Proteome Res. 5, 1224–1231 (2006).

  21. 21.

    et al. The International Protein Index: an integrated database for proteomics experiments. Proteomics 4, 1985–1988 (2004).

Download references

Acknowledgements

This work was supported in part by US National Institutes of Health (GM67945 and HG00041 to S.P.G.). We thank S. Beausoleil, P. Everley, S. Gerber and W. Haas for continuing and insightful discussions, and Sage-N for implementing our idea of the pseudo-reversed searches on their SEQUEST platform.

Author information

Affiliations

  1. Department of Cell Biology, 240 Longwood Avenue, Harvard Medical School, Boston, Massachusetts 02115, USA

    • Joshua E Elias
    •  & Steven P Gygi
  2. Taplin Biological Mass Spectrometry Facility, 240 Longwood Avenue, Harvard Medical School, Boston, Massachusetts 02115, USA.

    • Steven P Gygi

Authors

  1. Search for Joshua E Elias in:

  2. Search for Steven P Gygi in:

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to Steven P Gygi.

Supplementary information

PDF files

  1. 1.

    Supplementary Fig. 1

    False positive identifications can be estimated by doubling decoy hits from a search against a concatenated target/decoy database.

  2. 2.

    Supplementary Fig. 2

    The distributions of potential peptide matches is consistent between target and decoy databases.

  3. 3.

    Supplementary Fig. 3

    Example supporting the necessity for target/decoy competition.

  4. 4.

    Supplementary Fig. 4

    Relative scores shift to smaller values for less than half of peptide hits when searched against composite target-decoy databases as opposed to separate databases.

  5. 5.

    Supplementary Fig. 5

    Using decoy hits to guide selection of appropriate selection criteria.

  6. 6.

    Supplementary Table 1

    Slopes of best-fit lines for precision values shown in Figure 5b.

Word documents

  1. 1.

    Supplementary Methods

About this article

Publication history

Published

DOI

https://doi.org/10.1038/nmeth1019

Further reading