Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

PECAN: library-free peptide detection for data-independent acquisition tandem mass spectrometry data

Abstract

Data-independent acquisition (DIA) is an emerging mass spectrometry (MS)-based technique for unbiased and reproducible measurement of protein mixtures. DIA tandem mass spectrometry spectra are often highly multiplexed, containing product ions from multiple cofragmenting precursors. Detecting peptides directly from DIA data is therefore challenging; most DIA data analyses require spectral libraries. Here we present PECAN (http://pecan.maccosslab.org), a library-free, peptide-centric tool that robustly and accurately detects peptides directly from DIA data. PECAN reports evidence of detection based on product ion scoring, which enables detection of low-abundance analytes with poor precursor ion signal. We demonstrate the chromatographic peak picking accuracy and peptide detection capability of PECAN, and we further validate its detection with data-dependent acquisition and targeted analyses. Lastly, we used PECAN to build a plasma proteome library from DIA data and to query known sequence variants.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Overview of PECAN workflow.
Figure 2: PECAN peak picking performance on the SIS data set.
Figure 3: Validation of PECAN detection with GST fusion proteins.
Figure 4: Deep proteome measurement with gas-phase fractionation.
Figure 5: Natural variants in the plasma library data.

Similar content being viewed by others

References

  1. Venable, J.D., Dong, M.-Q., Wohlschlegel, J., Dillin, A. & Yates, J.R. Automated approach for quantitative analysis of complex peptide mixtures from tandem mass spectra. Nat. Methods 1, 39–45 (2004).

    Article  CAS  Google Scholar 

  2. Chapman, J.D., Goodlett, D.R. & Masselon, C.D. Multiplexed and data-independent tandem mass spectrometry for global proteome profiling. Mass Spectrom. Rev. 33, 452–470 (2014).

    Article  CAS  Google Scholar 

  3. Röst, H.L. et al. OpenSWATH enables automated, targeted analysis of data-independent acquisition MS data. Nat. Biotechnol. 32, 219–223 (2014).

    Article  Google Scholar 

  4. Wang, J. et al. MSPLIT-DIA: sensitive peptide identification for data-independent acquisition. Nat. Methods 12, 1106–1108 (2015).

    Article  CAS  Google Scholar 

  5. Ting, Y.S. et al. Peptide-centric proteome analysis: an alternative strategy for the analysis of tandem mass spectrometry data. Mol. Cell. Proteomics 14, 2301–2307 (2015).

    Article  CAS  Google Scholar 

  6. Tsou, C.-C. et al. DIA-Umpire: comprehensive computational framework for data-independent acquisition proteomics. Nat. Methods 12, 258–264 (2015).

    Article  CAS  Google Scholar 

  7. Li, Y. et al. Group-DIA: analyzing multiple data-independent acquisition mass spectrometry data files. Nat. Methods 12, 1105–1106 (2015).

    Article  CAS  Google Scholar 

  8. Panchaud, A. et al. Precursor acquisition independent from ion count: how to dive deeper into the proteomics ocean. Anal. Chem. 81, 6481–6488 (2009).

    Article  CAS  Google Scholar 

  9. Weisbrod, C.R., Eng, J.K., Hoopmann, M.R., Baker, T. & Bruce, J.E. Accurate peptide fragment mass analysis: multiplexed peptide identification and quantification. J. Proteome Res. 11, 1621–1632 (2012).

    Article  CAS  Google Scholar 

  10. Käll, L., Canterbury, J.D., Weston, J., Noble, W.S. & MacCoss, M.J. Semi-supervised learning for peptide identification from shotgun proteomics datasets. Nat. Methods 4, 923–925 (2007).

    Article  Google Scholar 

  11. Eng, J.K., Jahan, T.A. & Hoopmann, M.R. Comet: an open-source MS/MS sequence database search tool. Proteomics 13, 22–24 (2012).

    Article  Google Scholar 

  12. Gillet, L.C. et al. Targeted data extraction of the MS/MS spectra generated by data-independent acquisition: a new concept for consistent and accurate proteome analysis. Mol. Cell. Proteomics 11, O111.016717 (2012).

    Article  Google Scholar 

  13. Beausoleil, S.A., Villén, J., Gerber, S.A., Rush, J. & Gygi, S.P. A probability-based approach for high-throughput protein phosphorylation analysis and site localization. Nat. Biotechnol. 24, 1285–1292 (2006).

    Article  CAS  Google Scholar 

  14. Bald, T. et al. pymzML--Python module for high-throughput bioinformatics on mass spectrometry data. Bioinformatics 28, 1052–1053 (2012).

    Article  CAS  Google Scholar 

  15. Martens, L. et al. mzML—a community standard for mass spectrometry data. Mol. Cell. Proteomics 10, R110.000133 (2011).

    Article  Google Scholar 

  16. MacLean, B. et al. Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics 26, 966–968 (2010).

    Article  CAS  Google Scholar 

  17. Murray, K.K. et al. Definitions of terms relating to mass spectrometry (IUPAC recommendations 2013). Pure Appl. Chem. 85, 1515–1609 (2013).

    Article  CAS  Google Scholar 

  18. Granholm, V., Navarro, J.F., Noble, W.S. & Käll, L. Determining the calibration of confidence estimation procedures for unique peptides in shotgun proteomics. J. Proteomics 80, 123–131 (2013).

    Article  CAS  Google Scholar 

  19. Stergachis, A.B., MacLean, B., Lee, K., Stamatoyannopoulos, J.A. & MacCoss, M.J. Rapid empirical discovery of optimal peptides for targeted proteomics. Nat. Methods 8, 1041–1043 (2011).

    Article  CAS  Google Scholar 

  20. Davis, M.T. et al. Towards defining the urinary proteome using liquid chromatography-tandem mass spectrometry. II. Limitations of complex mixture analyses. Proteomics 1, 108–117 (2001).

    Article  CAS  Google Scholar 

  21. Serang, O., MacCoss, M.J. & Noble, W.S. Efficient marginalization to compute protein posterior probabilities from shotgun mass spectrometry data. J. Proteome Res. 9, 5346–5357 (2010).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

The authors thank L. Käll, A.I. Nesvizhskii, N. Bandeira, and J.K. Eng for insightful discussions. This work was supported by the National Institutes of Health Grants P30 AG013280, R21 CA192983, P41 GM103533, and U54 HG008097. S.H.P. was supported by the US Department of Energy, Office of Science, Office of Biological and Environmental Research, and Early Career Research Program.

Author information

Authors and Affiliations

Authors

Contributions

Y.S.T. and M.J.M. designed the experiments. Y.S.T. developed the algorithms with input from J.D.E., S.H.P., B.C.S., W.S.N., and M.J.M. Y.S.T. performed the analyses. Y.S.T. and J.G.B. acquired the data. Software was written by Y.S.T. with input from J.D.E. and B.C.S. The manuscript was written by Y.S.T. with substantial input from J.D.E., S.H.P., W.S.N., and M.J.M.

Corresponding author

Correspondence to Michael J MacCoss.

Ethics declarations

Competing interests

The MacCoss Lab at the University of Washington has a sponsored research agreement with Thermo Fisher Scientific, the manufacturer of the instrumentation used in this research. Additionally, M.J.M. is a paid consultant for Thermo Fisher Scientific.

Integrated supplementary information

Supplementary Figure 1 Retention time analysis for common peptides from Comet-DDA and PECAN-DIA.

Of the 5,182 peptides commonly detected by PECAN from 4xGPF DIA data and Comet from 4xGPF DDA data, 27 peptides were identified more than 2 minutes apart.

Supplementary Figure 2 Dynamic range of DIA plasma library.

Relative concentration values of 248 plasma proteins are taken from the literature. (Source: Leigh Anderson, The Plasma Proteome Institute, Washington, DC, USA, modified from ref Mol. Cell Proteomics 1, 845–847, 2002.) Color of the dot represents the number of peptides unique to the protein or only shared by its isoforms in the DIA plasma library. Note that some literature values are measurement for protein complex or specific fragments of the protein (e.g. values for Prothrombin and Fibrinogen alpha chain), of which the intact protein concentration could be higher.

Supplementary Figure 3 Assessment of background scores estimation with 1,000 random sampling.

(a) Boxplot shows the distribution of 2,185 CVs of the RSEs from 1,000 random sampling at each decoy size. (b) The estimated background scores with 2,000 charge 2 and 2,000 charge 3 decoys for 2,185 MS/MS spectra presented over retention time. Black lines trace the median of the decoy means from 1,000 estimations by random sampling and the blue shades are segments between the 25th and 75th percentiles. (c) Bonferroni corrected p-values from Wilcoxon rank-sum tests between the 1,000 estimations using either 2,000 charge 2 or 2,000 charge 3 decoys for individual spectrum. Grey lines indicated the p-value is smaller than 0.05 and therefore rejected the null hypothesis.

Supplementary Figure 4 Evidence qualifying procedure in PECAN.

An evidence of detection (abbr. evidence) for a query peptide p at the time t is the average of the calibrated primary scores from a short period of retention time (see Methods), centered at the time t. Following this flowchart, PECAN reports a user-defined number of qualified evidence(s) that are calculated from primary scores which have never been used to calculate other qualified evidences(s).

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–4, Supplementary Tables 1–3 and Supplementary Notes 1–6 (PDF 4050 kb)

Reporting Summary

Life Sciences Reporting Summary (PDF 129 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ting, Y., Egertson, J., Bollinger, J. et al. PECAN: library-free peptide detection for data-independent acquisition tandem mass spectrometry data. Nat Methods 14, 903–908 (2017). https://doi.org/10.1038/nmeth.4390

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nmeth.4390

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research