Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Spectral archives: extending spectral libraries to analyze both identified and unidentified spectra

Abstract

Tandem mass spectrometry (MS/MS) experiments yield multiple, nearly identical spectra of the same peptide in various laboratories, but proteomics researchers typically do not leverage the unidentified spectra produced in other labs to decode spectra they generate. We propose a spectral archives approach that clusters MS/MS datasets, representing similar spectra by a single consensus spectrum. Spectral archives extend spectral libraries by analyzing both identified and unidentified spectra in the same way and maintaining information about peptide spectra that are common across species and conditions. Thus archives offer both traditional library spectrum similarity-based search capabilities along with new ways to analyze the data. By developing a clustering tool, MS-Cluster, we generated a spectral archive from 1.18 billion spectra that greatly exceeds the size of existing spectral repositories. We advocate that publicly available data should be organized into spectral archives rather than be analyzed as disparate datasets, as is mostly the case today.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Clustering of the PNNL dataset.
Figure 2: Identification of peptides across different species.

Similar content being viewed by others

References

  1. Stein, S.E. & Scott, D.R. Optimization and testing of mass spectral library search algorithms for compound identification. J. Am. Soc. Mass Spectrom. 5, 859–866 (1994).

    Article  CAS  Google Scholar 

  2. Yates, J.R. III, Morgan, S.F., Gatlin, C.L., Griffin, P.R. & Eng, J.K. Method to compare collision-induced dissociation spectra of peptides: Potential for library searching and subtractive analysis. Anal. Chem. 70, 3557–3565 (1998).

    Article  CAS  Google Scholar 

  3. Craig, R., Cortens, J.C., Fenyo, D. & Beavis, R.C. Using annotated peptide mass spectrum libraries for protein identification. J. Proteome Res. 5, 1843–1849 (2006).

    Article  CAS  Google Scholar 

  4. Lam, H. et al. Development and validation of a spectral library searching method for peptide identification from ms/ms. Proteomics 7, 655–667 (2007).

    Article  CAS  Google Scholar 

  5. Beer, I., Barnea, E., Ziv, T. & Admon, A. Improving large-scale proteomics by clustering of mass spectrometry data. Proteomics 4, 950–960 (2004).

    Article  CAS  Google Scholar 

  6. Tabb, D.L., Thompson, M.R., Khalsa-Moyers, G., VerBerkmoes, N.C. & McDonald, W.H. MS2Grouper: group assessment and synthetic replacement of duplicate proteomic tandem mass spectra. J. Am. Soc. Mass Spectrom. 16, 1250–1261 (2005).

    Article  CAS  Google Scholar 

  7. Flikka, K. et al. Implementation and application of a versatile clustering tool for tandem mass spectrometry data. Proteomics 7, 3245–3258 (2007).

    Article  CAS  Google Scholar 

  8. Frank, A.M. et al. Clustering millions of tandem mass spectra. J. Proteome Res. 7, 113–122 (2008).

    Article  CAS  Google Scholar 

  9. Bandeira, N., Tsur, D., Frank, A. & Pevzner, P. Protein identification by spectral networks analysis. Proc. Natl. Acad. Sci. USA 104, 6140–6145 (2007).

    Article  CAS  Google Scholar 

  10. Elias, J.E. & Gygi, S.P. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods 4, 207–214 (2007).

    Article  CAS  Google Scholar 

  11. Tanner, S. et al. Improving gene annotation using peptide mass spectrometry. Genome Res. 17, 231–239 (2007).

    Article  CAS  Google Scholar 

  12. Gupta, N. & Pevzner, P.A. False discovery rates of protein identifications: a strike against the two peptide rule. J. Proteome Res. 8, 4173–4181 (2009).

    Article  CAS  Google Scholar 

  13. Searle, B.C., Turner, M. & Nesvizhskii, A.I. Improving sensitivity by probabilistically combining results from multiple ms/ms search methodologies. J. Proteome Res. 7, 245–253 (2008).

    Article  CAS  Google Scholar 

  14. Tsur, D., Tanner, S., Zandi, E., Bafna, V. & Pevzner, P.A. Identification of post-translational modifications via blind search of mass-spectra. Nat. Biotechnol. 23, 1562–1567 (2005).

    Article  CAS  Google Scholar 

  15. Shevchenko, A. et al. Charting the proteomes of organisms with unsequenced genomes by MALDI quadrupole time-of flight mass spectrometry and BLAST homology searching. Anal. Chem. 73, 1917–1926 (2001).

    Article  CAS  Google Scholar 

  16. Han, Y., Ma, B. & Zhang, K. SPIDER: software for protein identification from sequence tags with de novo sequencing error. J. Bioinform. Comput. Biol. 3, 697–716 (2005).

    Article  CAS  Google Scholar 

  17. Waridel, P. et al. Sequence similarity-driven proteomics in organisms with unknown genomes by lc-ms/ms and automated de novo sequencing. Proteomics 7, 2318–2329 (2007).

    Article  CAS  Google Scholar 

  18. Choudhary, J.S., Blackstock, W.P., Creasy, D.M. & Cottrell, J.S. Matching peptide mass spectra to EST and genomic DNA databases. Trends Biotechnol. 19, S17–S22 (2001).

    Article  CAS  Google Scholar 

  19. Jaffe, J.D., Berg, H.C. & Church, G.M. Proteogenomic mapping as a complementary method to perform genome annotation. Proteomics 4, 59–77 (2004).

    Article  CAS  Google Scholar 

  20. Desiere, F. et al. Integration with the human genome of peptide sequences obtained by high-throughput mass spectrometry. Genome Biol. 6, R9 (2005).

    Article  Google Scholar 

  21. Siepel, A. et al. Targeted discovery of novel human exons by comparative genomics. Genome Res. 17, 1763–1773 (2007).

    Article  CAS  Google Scholar 

  22. Ma, B. et al. PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. Rapid Commun. Mass Spectrom. 17, 2337–2342 (2003).

    Article  CAS  Google Scholar 

  23. Frank, A. & Pevzner, P. Pepnovo: de novo peptide sequencing via probabilistic network modeling. Anal. Chem. 77, 964–973 (2005).

    Article  CAS  Google Scholar 

  24. Savitski, M.M., Nielsen, M.L., Kjeldsen, F. & Zubarev, R.A. Proteomics-grade de novo sequencing approach. J. Proteome Res. 4, 2348–2354 (2005).

    Article  CAS  Google Scholar 

  25. Shen, Y. et al. De novo sequencing of unique sequence tags for discovery of post-translational modifications of proteins. Anal. Chem. 80, 7742–7754 (2008).

    Article  CAS  Google Scholar 

  26. Kim, S., Gupta, N., Bandeira, N. & Pevzner, P.A. Spectral dictionaries: integrating de novo peptide sequencing with database search of tandem mass spectra. Mol. Cell. Proteomics 8, 53–69 (2009).

    Article  CAS  Google Scholar 

  27. Ng, J. & Pevzner, P.A. Algorithm for identification of fusion proteins via mass spectrometry. J. Proteome Res. 7, 89–95 (2008).

    Article  CAS  Google Scholar 

  28. Junqueira, M. et al. Separating the wheat from the chaff: unbiased filtering of background tandem mass spectra improves protein identification. J. Proteome Res. 7, 3382–3395 (2008).

    Article  CAS  Google Scholar 

  29. Xu, B. et al. Identification of early intestinal neoplasia protein biomarkers using laser capture microdissection and MALDI MS. Mol. Cell. Proteomics 8, 936–945 (2009).

    Article  CAS  Google Scholar 

  30. Andoni, A. & Indyk, P. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51, 117–122 (2008).

    Article  Google Scholar 

  31. Masselon, C. et al. Targeted comparative proteomics by liquid chromatography-tandem fourier ion cyclotron resonance mass spectrometry. Anal. Chem. 77, 400–406 (2005).

    Article  CAS  Google Scholar 

  32. Gupta, N. et al. Whole proteome analysis of post-translational modifications: applications of massspectrometry for proteogenomic annotation. Genome Res. 17, 1362–1377 (2007).

    Article  CAS  Google Scholar 

  33. Tanner, S. et al. Inspect: fast and accurate identification of post-translationally modified peptides from tandem mass spectra. Anal. Chem. 77, 4626–4639 (2005).

    Article  CAS  Google Scholar 

  34. Bern, M., Goldberg, D., McDonald, W.H. & Yates, J.R. III . Automatic quality assessment of peptide tandem mass spectra. Bioinformatics 20, i49–i54 (2004).

    Article  CAS  Google Scholar 

  35. Flikka, K., Martens, L., Vandekerckhove, J., Gevaert, K. & Eidhammer, I. Improving the reliability and throughput of mass spectrometry-based proteomics by spectrum quality filtering. Proteomics 6, 2086–2094 (2006).

    Article  CAS  Google Scholar 

  36. Nesvizhskii, A.I. et al. Dynamic spectrum quality assessment and iterative computational analysis of shotgun proteomic data: toward more efficient identification of post-translational modifications, sequence polymorphisms, and novel peptides. Mol. Cell. Proteomics 5, 652–670 (2006).

    Article  CAS  Google Scholar 

  37. Wong, J., Sullivan, M., Cartwright, H. & Cagney, G. msmseval: tandem mass spectral quality assignment for high-throughput proteomics. BMC Bioinformatics 8, 51 (2007).

    Article  Google Scholar 

  38. Salmi, J. et al. Quality classification of tandem mass spectrometry data. Bioinformatics 22, 400–406 (2007).

    Article  Google Scholar 

  39. Wan, X.K., Vidavsky, I. & Gross, M.L. Comparing similar spectra: from similarity index to spectral contrast angle. J. Am. Soc. Mass Spectrom. 13, 85–88 (2002).

    Article  CAS  Google Scholar 

  40. Tabb, D.L., MacCoss, M.J., Wu, C.C., Anderson, S.D. & Yates, J.R. III . Similarity among tandem mass spectra from proteomic experiments: detection, significance, and utility. Anal. Chem. 75, 2470–2477 (2003).

    Article  CAS  Google Scholar 

  41. Ramakrishnan, S.R. et al. A fast coarse filtering method for peptide identification by mass spectrometry. Bioinformatics 22, 1524–1531 (2006).

    Article  CAS  Google Scholar 

  42. Liu, J. et al. Methods for peptide identification by spectral comparison. Proteome Sci. 5, 3 (2007).

    Article  Google Scholar 

  43. Frewen, F.B., Merrihew, G.E., Wu, C.C., Stafford Noble, W. & MacCoss, M.J. Analysis of peptide ms/ms spectra from large-scale proteomics experiments using spectrum libraries. Anal. Chem. 78, 5678–5684 (2006).

    Article  CAS  Google Scholar 

  44. Jaitly, N. et al. Robust algorithm for alignment of liquid chromatography-mass spectrometry analyses in an accurate mass and time tag data analysis pipeline. Anal. Chem. 78, 7397–7409 (2006).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We thank I. Kaufman for his assistance in running the experiments on the computational grid. This work was supported by US National Institutes of Health grant 1-P41-RR024851 from the National Center for Research Resources. This work used measurements based upon capabilities developed by the Department of Energy, Office of Biological and Environmental Research, and National Center for Research Resources (grant RR18522) conducted at the Environmental Molecular Sciences Laboratory, a national scientific user facility located at Pacific Northwest National Laboratory in Richland, Washington, USA.

Author information

Authors and Affiliations

Authors

Contributions

A.M.F. designed and implemented the algorithms, designed and ran the experiments and wrote the paper. P.A.P. designed the algorithms and the experiments and wrote the paper. R.D.S. developed the measurement capabilities. R.J.M. was responsible for the measurements. M.E.M. and G.A.A. developed protocols and did the proteomics data acquisition and processing. A.R.S. assisted in designing the experiments. J.J.C. and N.B. designed and implement the web-based archive searching tool. All authors discussed, commented and contributed to writing the paper.

Corresponding author

Correspondence to Pavel A Pevzner.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Tables 1–2 and Supplementary Notes 1–6 (PDF 544 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Frank, A., Monroe, M., Shah, A. et al. Spectral archives: extending spectral libraries to analyze both identified and unidentified spectra. Nat Methods 8, 587–591 (2011). https://doi.org/10.1038/nmeth.1609

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nmeth.1609

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research