Data-independent acquisition (DIA) is an emerging mass spectrometry (MS)-based technique for unbiased and reproducible measurement of protein mixtures. DIA tandem mass spectrometry spectra are often highly multiplexed, containing product ions from multiple cofragmenting precursors. Detecting peptides directly from DIA data is therefore challenging; most DIA data analyses require spectral libraries. Here we present PECAN (http://pecan.maccosslab.org), a library-free, peptide-centric tool that robustly and accurately detects peptides directly from DIA data. PECAN reports evidence of detection based on product ion scoring, which enables detection of low-abundance analytes with poor precursor ion signal. We demonstrate the chromatographic peak picking accuracy and peptide detection capability of PECAN, and we further validate its detection with data-dependent acquisition and targeted analyses. Lastly, we used PECAN to build a plasma proteome library from DIA data and to query known sequence variants.
At a glance
- Automated approach for quantitative analysis of complex peptide mixtures from tandem mass spectra. Nat. Methods 1, 39–45 (2004). , , , &
- Multiplexed and data-independent tandem mass spectrometry for global proteome profiling. Mass Spectrom. Rev. 33, 452–470 (2014). , &
- OpenSWATH enables automated, targeted analysis of data-independent acquisition MS data. Nat. Biotechnol. 32, 219–223 (2014). et al.
- MSPLIT-DIA: sensitive peptide identification for data-independent acquisition. Nat. Methods 12, 1106–1108 (2015). et al.
- Peptide-centric proteome analysis: an alternative strategy for the analysis of tandem mass spectrometry data. Mol. Cell. Proteomics 14, 2301–2307 (2015). et al.
- DIA-Umpire: comprehensive computational framework for data-independent acquisition proteomics. Nat. Methods 12, 258–264 (2015). et al.
- Group-DIA: analyzing multiple data-independent acquisition mass spectrometry data files. Nat. Methods 12, 1105–1106 (2015). et al.
- Precursor acquisition independent from ion count: how to dive deeper into the proteomics ocean. Anal. Chem. 81, 6481–6488 (2009). et al.
- Accurate peptide fragment mass analysis: multiplexed peptide identification and quantification. J. Proteome Res. 11, 1621–1632 (2012). , , , &
- Semi-supervised learning for peptide identification from shotgun proteomics datasets. Nat. Methods 4, 923–925 (2007). , , , &
- Comet: an open-source MS/MS sequence database search tool. Proteomics 13, 22–24 (2012). , &
- Targeted data extraction of the MS/MS spectra generated by data-independent acquisition: a new concept for consistent and accurate proteome analysis. Mol. Cell. Proteomics 11, O111.016717 (2012). et al.
- A probability-based approach for high-throughput protein phosphorylation analysis and site localization. Nat. Biotechnol. 24, 1285–1292 (2006). , , , &
- pymzML--Python module for high-throughput bioinformatics on mass spectrometry data. Bioinformatics 28, 1052–1053 (2012). et al.
- mzML—a community standard for mass spectrometry data. Mol. Cell. Proteomics 10, R110.000133 (2011). et al.
- Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics 26, 966–968 (2010). et al.
- Definitions of terms relating to mass spectrometry (IUPAC recommendations 2013). Pure Appl. Chem. 85, 1515–1609 (2013). et al.
- Determining the calibration of confidence estimation procedures for unique peptides in shotgun proteomics. J. Proteomics 80, 123–131 (2013). , , &
- Rapid empirical discovery of optimal peptides for targeted proteomics. Nat. Methods 8, 1041–1043 (2011). , , , &
- Towards defining the urinary proteome using liquid chromatography-tandem mass spectrometry. II. Limitations of complex mixture analyses. Proteomics 1, 108–117 (2001). et al.
- Efficient marginalization to compute protein posterior probabilities from shotgun mass spectrometry data. J. Proteome Res. 9, 5346–5357 (2010). , &
- Supplementary Figure 1: Retention time analysis for common peptides from Comet-DDA and PECAN-DIA. (81 KB)
Of the 5,182 peptides commonly detected by PECAN from 4xGPF DIA data and Comet from 4xGPF DDA data, 27 peptides were identified more than 2 minutes apart.
- Supplementary Figure 2: Dynamic range of DIA plasma library. (383 KB)
Relative concentration values of 248 plasma proteins are taken from the literature. (Source: Leigh Anderson, The Plasma Proteome Institute, Washington, DC, USA, modified from ref Mol. Cell Proteomics 1, 845–847, 2002.) Color of the dot represents the number of peptides unique to the protein or only shared by its isoforms in the DIA plasma library. Note that some literature values are measurement for protein complex or specific fragments of the protein (e.g. values for Prothrombin and Fibrinogen alpha chain), of which the intact protein concentration could be higher.
- Supplementary Figure 3: Assessment of background scores estimation with 1,000 random sampling. (261 KB)
(a) Boxplot shows the distribution of 2,185 CVs of the RSEs from 1,000 random sampling at each decoy size. (b) The estimated background scores with 2,000 charge 2 and 2,000 charge 3 decoys for 2,185 MS/MS spectra presented over retention time. Black lines trace the median of the decoy means from 1,000 estimations by random sampling and the blue shades are segments between the 25th and 75th percentiles. (c) Bonferroni corrected p-values from Wilcoxon rank-sum tests between the 1,000 estimations using either 2,000 charge 2 or 2,000 charge 3 decoys for individual spectrum. Grey lines indicated the p-value is smaller than 0.05 and therefore rejected the null hypothesis.
- Supplementary Figure 4: Evidence qualifying procedure in PECAN. (50 KB)
An evidence of detection (abbr. evidence) for a query peptide p at the time t is the average of the calibrated primary scores from a short period of retention time (see Methods), centered at the time t. Following this flowchart, PECAN reports a user-defined number of qualified evidence(s) that are calculated from primary scores which have never been used to calculate other qualified evidences(s).