Abstract

We present a sequence-tag-based search engine, Open-pFind, to identify peptides in an ultra-large search space that includes coeluting peptides, unexpected modifications and digestions. Our method detects peptides with higher precision and speed than seven other search engines. Open-pFind identified 70–85% of the tandem mass spectra in four large-scale datasets and 14,064 proteins, each supported by at least two protein-unique peptides, in a human proteome dataset.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    et al. Nat. Biotechnol. 33, 743–749 (2015).

  2. 2.

    , & Mol. Cell. Proteomics 15, 2791–2801 (2016).

  3. 3.

    et al. Mol. Cell. Proteomics 4, 1189–1193 (2005).

  4. 4.

    et al. Anal. Chem. 77, 4626–4639 (2005).

  5. 5.

    et al. Nat. Methods 13, 651–656 (2016).

  6. 6.

    et al. Anal. Chem. 89, 1244–1253 (2017).

  7. 7.

    , & J. Proteome Res. 10, 1785–1793 (2011).

  8. 8.

    & Nat. Biotechnol. 33, 717–718 (2015).

  9. 9.

    , , , & Nat. Methods 14, 513–520 (2017).

  10. 10.

    et al. Anal. Chem. 86, 5286–5294 (2014).

  11. 11.

    et al. Mol. Cell. Proteomics 10, M111.011015 (2011).

  12. 12.

    et al. Nat. Neurosci. 18, 1819–1831 (2015).

  13. 13.

    et al. Nature 509, 575–581 (2014).

  14. 14.

    , , & J. Proteomics 80, 123–131 (2013).

  15. 15.

    & Anal. Chem. 70, 5150–5158 (1998).

  16. 16.

    et al. Expert Rev. Proteomics 12, 579–593 (2015).

  17. 17.

    , , , & Mol. Cell. Proteomics 14, 2394–2404 (2015).

  18. 18.

    et al. Proteomics 12, 226–235 (2012).

  19. 19.

    & Anal. Chem. 66, 4390–4399 (1994).

  20. 20.

    , & Anal. Chem. 75, 6415–6421 (2003).

  21. 21.

    , , & Mol. Cell. Proteomics 8, 53–69 (2009).

  22. 22.

    , , & J. Am. Soc. Mass Spectrom. 27, 1719–1727 (2016).

  23. 23.

    , , , & Nat. Methods 4, 923–925 (2007).

  24. 24.

    , & J. Am. Soc. Mass Spectrom. 5, 976–989 (1994).

  25. 25.

    & Nat. Biotechnol. 26, 1367–1372 (2008).

  26. 26.

    et al. J. Proteome Res. 10, 1794–1805 (2011).

  27. 27.

    & Rapid Commun. Mass Spectrom. 17, 2310–2316 (2003).

  28. 28.

    & Proteomics 4, 1534–1536 (2004).

  29. 29.

    , , , & J. Mach. Learn. Res. 9, 1871–1874 (2008).

  30. 30.

    et al. J. Proteomics 125, 89–97 (2015).

  31. 31.

    et al. Anal. Chem. 89, 12690–12697 (2017).

  32. 32.

    et al. J. Biol. Chem. 292, 1187–1196 (2017).

  33. 33.

    , , , & Anal. Chem. 78, 686–694 (2006).

  34. 34.

    et al. Bioinformatics 20, 3236–3237 (2004).

Download references

Acknowledgements

This work was supported by grants from the National Key Research and Development Program of China (No. 2016YFA0501300 to S.-M.H., 2017YFA0505100 and 2017YFC0906600 to P.X. and 2012CB316502 to P.-H.Z.), the Youth Innovation Promotion Association CAS (No. 2014091 to H.C.), the National Natural Science Foundation of China (31470805 to H.C., 31670834 to P.X., 31700727 to C.L., and 21475141 to S.-M.H.), the CAS Interdisciplinary Innovation Team (Y604061000 to S.-M.H.), the International Collaboration Program (2014DFB30020 to P.X.), and the Beijing Training Project for The Leading Talents in S&T (Z161100004916024 to P.X.).

Author information

Author notes

    • Hao Chi
    • , Chao Liu
    •  & Hao Yang

    These authors contributed equally to this work.

Affiliations

  1. Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China.

    • Hao Chi
    • , Chao Liu
    • , Hao Yang
    • , Wen-Feng Zeng
    • , Long Wu
    • , Wen-Jing Zhou
    • , Rui-Min Wang
    • , Xiu-Nan Niu
    • , Zhao-Wei Wang
    • , Zhen-Lin Chen
    • , Rui-Xiang Sun
    • , Tao Liu
    • , Guang-Ming Tan
    • , Pei-Heng Zhang
    •  & Si-Min He
  2. University of Chinese Academy of Sciences, Beijing, China.

    • Hao Chi
    • , Chao Liu
    • , Hao Yang
    • , Wen-Feng Zeng
    • , Long Wu
    • , Wen-Jing Zhou
    • , Rui-Min Wang
    • , Xiu-Nan Niu
    • , Zhao-Wei Wang
    • , Zhen-Lin Chen
    • , Rui-Xiang Sun
    •  & Si-Min He
  3. National Institute of Biological Sciences, Beijing, Beijing, China.

    • Yue-He Ding
    •  & Meng-Qiu Dong
  4. State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, China.

    • Yao Zhang
    •  & Ping Xu
  5. State Key Laboratory of Biocontrol and Guangdong Provincial Key Laboratory of Plant Resources, College of Ecology and Evolution, Sun Yat-Sen University, Guangzhou, China.

    • Yao Zhang

Authors

  1. Search for Hao Chi in:

  2. Search for Chao Liu in:

  3. Search for Hao Yang in:

  4. Search for Wen-Feng Zeng in:

  5. Search for Long Wu in:

  6. Search for Wen-Jing Zhou in:

  7. Search for Rui-Min Wang in:

  8. Search for Xiu-Nan Niu in:

  9. Search for Yue-He Ding in:

  10. Search for Yao Zhang in:

  11. Search for Zhao-Wei Wang in:

  12. Search for Zhen-Lin Chen in:

  13. Search for Rui-Xiang Sun in:

  14. Search for Tao Liu in:

  15. Search for Guang-Ming Tan in:

  16. Search for Meng-Qiu Dong in:

  17. Search for Ping Xu in:

  18. Search for Pei-Heng Zhang in:

  19. Search for Si-Min He in:

Contributions

H.C. developed the kernel algorithm and command-line tool of Open-pFind, analyzed the data and wrote the manuscript. C.L. developed the validation method using metabolically labeled datasets. H.Y. developed the post-processing tool pBuild and helped with data analysis. W.-F.Z. helped to develop the machine learning module. L.W. developed the pre-processing tool pParse. Y.-H.D. and M.-Q.D. provided the Dong-Ecoli-QE dataset. Y.Z. and P.X. provided the Xu-Yeast-QEHF dataset. W.-J.Z., R.-M.W., X.-N.N., Z.-W.W., Z.-L.C. and R.-X.S. helped with the development of interface and data analysis. T.L., G.-M.T. and P.-H.Z. helped with the performance test on the workstation. S.-M.H. coordinated the study. All of the authors helped to revise the manuscript.

Competing interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to Hao Chi or Si-Min He.

Integrated supplementary information

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–14

  2. 2.

    Life Sciences Reporting Summary

  3. 3.

    Supplementary Tables and Notes

    Supplementary Tables 1–10, Supplementary Notes 1–5

Excel files

  1. 1.

    Supplementary Data 1

    The number of identified PSMs, distinct peptides and distinct peptide sequences for the four published datasets

  2. 2.

    Supplementary Data 2

    The consistency of the identification results

  3. 3.

    Supplementary Data 3

    Detailed information for highly abundant modifications of cysteine residues in Kim data

  4. 4.

    Supplementary Data 4

    Detailed information for 694 semi-tryptic peptides verified by UniProt in Kim data

Executable files

  1. 1.

    Supplementary Code

    Supplementary Code

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nbt.4236