Brief Communication | Published:

Deep learning enables de novo peptide sequencing from data-independent-acquisition mass spectrometry

Abstract

We present DeepNovo-DIA, a de novo peptide-sequencing method for data-independent acquisition (DIA) mass spectrometry data. We use neural networks to capture precursor and fragment ions across m/z, retention-time, and intensity dimensions. They are then further integrated with peptide sequence patterns to address the problem of highly multiplexed spectra. DIA coupled with de novo sequencing allowed us to identify novel peptides in human antibodies and antigens.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Data availability

Data and a pretrained model are publicly available in the MassIVE repository under accession number MSV000082368. Source data for Fig. 2 are available online.

References

  1. 1.

    Ott, P. A. et al. Nature 547, 217–221 (2017).

  2. 2.

    Sahin, U. et al. Nature 547, 222–226 (2017).

  3. 3.

    Anonymous. Nat. Biotechnol. 35, 97 (2017).

  4. 4.

    Vitiello, A. & Zanetti, M. Nat. Biotechnol. 35, 815–817 (2017).

  5. 5.

    Bassani-Sternberg, M. et al. Nat. Commun. 7, 13404 (2016).

  6. 6.

    Venable, J. D., Dong, M. Q., Wohlschlegel, J., Dillin, A. & Yates, J. R. Nat. Methods 1, 39–45 (2004).

  7. 7.

    Röst, H. L. et al. Nat. Biotechnol. 32, 219–223 (2014).

  8. 8.

    Egertson, J. D., MacLean, B., Johnson, R., Xuan, Y. & MacCoss, M. J. Nat. Protoc. 10, 887–903 (2015).

  9. 9.

    Tsou, C. C. et al. Nat. Methods 12, 258–264 (2015).

  10. 10.

    Ting, Y. S. et al. Nat. Methods 14, 903–908 (2017).

  11. 11.

    Tran, N. H., Zhang, X., Xin, L., Shan, B. & Li, M. Proc. Natl Acad. Sci. USA 114, 8247–8252 (2017).

  12. 12.

    Zhang, J. et al. Mol. Cell. Proteomics. 11, M111.010587 (2012).

  13. 13.

    Muntel, J. et al. J. Proteome. Res. 14, 4752–4762 (2015).

  14. 14.

    Bruderer, R. et al. Mol. Cell. Proteomics. 14, 1400–1410 (2015).

  15. 15.

    Tan, J. et al. Nature 529, 105–109 (2016).

  16. 16.

    Caron, E. et al. eLife 4, e07661 (2015).

  17. 17.

    Lin, T. Y., Goyal, P., Girshick, R., He, K. & Dollár, P. Focal loss for dense object detection. arXiv Preprint at https://arxiv.org/abs/1708.02002 (2017).

  18. 18.

    Tyanova, S., Temu, T. & Cox, J. Nat. Protoc. 11, 2301–2319 (2016).

Download references

Acknowledgements

This work was funded in part by NSERC (grant OGP0046506), China’s Research and Development Program (grants 2016YFB1000902 and 2018YFB1003202), the NSFC (grant 61832019), and the Canada Research Chair program for M.L. N.H.T. was supported by the Mitacs Elevate Fellowship. The authors thank N. Keshav, K.P. Choi, and K. Xiong for discussions and proofreading of the manuscript.

Author information

M.L., B.S., and N.H.T. conceived the research idea. N.H.T. designed the model, implemented the software, and analyzed the results. R.Q. and X.C. contributed to the model design, software development, and data analysis. M.L., B.S., and L.X. supervised the research project. C.L., X.Z., and A.G. contributed to the data analysis. N.H.T., M.L., and R.Q. wrote the manuscript.

Correspondence to Ming Li.

Ethics declarations

Competing interests

L.X., X.C., and B.S. are employees of Bioinformatics Solutions Inc., Waterloo, Ontario, Canada.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Comparison of unique peptides identified by DeepNovo, PECAN, and Spectronaut from the plasma dataset.

Note that the number of 3,268 de novo peptides reported here by DeepNovo have yet been validated (see Supplementary Note 1)

Supplementary Figure 2

Distribution of de novo confidence score versus peptide abundance of all peptides reported by DeepNovo from the plasma dataset

Supplementary Figure 3

Distribution of retention times of 1,143 de novo peptides reported by DeepNovo and database peptides reported by PECAN and Spectronaut from the plasma dataset

Supplementary Figure 4

Distribution of amino acids of 1,143 de novo peptides reported by DeepNovo and database peptides reported by PECAN and Spectronaut from the plasma dataset

Supplementary Figure 5 Unique peptides identified by DeepNovo, PECAN, and Spectronaut from the plasma dataset.

(a) Original model trained with the urine dataset. (b) Model retrained with part of the plasma dataset. Note that we have removed from DeepNovo the features that were used to retrain the model, so the numbers of DeepNovo peptides in a are less than those reported in Supplementary Fig. 1. DeepNovo peptides have not been filtered by sequencing errors and augmented database search (see Supplementary Note 1)

Supplementary Figure 6

Example of three de novo peptides aligned to the variable region of a recently published human antibody for malaria vaccine design

Supplementary Figure 7

Unique peptides identified by DeepNovo, OpenSWATH, and Spectronaut from the dataset Jurkat-Oxford

Supplementary Figure 8

Abundance distribution of 130 de novo peptides versus 102 peptides identified by DeepNovo and OpenSWATH or Spectronaut from the dataset Jurkat-Oxford

Supplementary Figure 9

DeepNovo sequencing framework

Supplementary Figure 10

Ion-CNN model

Supplementary Figure 11

Spectrum-CNN model

Supplementary information

Supplementary Information

Supplementary Figures 1–11 and Supplementary Notes 1 and 2

Reporting Summary

Supplementary Protocol

Documentation for using DeepNovo

Supplementary Software

Scripts and Swiss-Prot database FASTA file

Supplementary Data 1

Examples of DIA spectra from the plasma dataset that contain multiple precursors with at least one low-abundance, novel peptide identified by DeepNovo but not by other database search tools

Supplementary Data 2

Twelve examples from the plasma dataset showing that the low-abundance, novel peptides identified by DeepNovo have better supporting fragment ions than those candidate sequences returned by the database search engine

Supplementary Data 3

Evidence of supporting fragment ions (left column), coelution profiles of fragment ions and precursor ion (right column), and antibody protein ID for 30 low-abundance, novel peptides identified by DeepNovo from the plasma dataset. The database search engine was not able to find any candidate sequences that matched these 30 precursors

Supplementary Data 4

Twelve examples of low-abundance, novel HLA peptides that were identified by DeepNovo but not by other database search tools

Supplementary Table 1

Summary of training and testing datasets in our study

Supplementary Table 2

List of 2,753 unique peptides predicted by DeepNovo from the plasma dataset

Supplementary Table 3

Novel peptides that were identified by DeepNovo from the plasma dataset and were found in variable regions of human immunoglobulin light chains

Supplementary Table 4

Novel peptides that were identified by DeepNovo from the plasma dataset and were found in variable regions of human immunoglobulin heavy chains

Supplementary Table 5

Novel peptides that were identified by DeepNovo from the plasma dataset and contained human natural variants

Supplementary Table 6

List of 304 unique peptides predicted by DeepNovo from the Jurkat-Oxford dataset

Source data

Source Data, Figure 2

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Further reading

Fig. 1: The workflow of DeepNovo-DIA for de novo sequencing of DIA data.
Fig. 2: DeepNovo-DIA evaluation of three datasets: ovarian cyst (OC), urinary tract infection (UTI), and plasma.
Supplementary Figure 1: Comparison of unique peptides identified by DeepNovo, PECAN, and Spectronaut from the plasma dataset.
Supplementary Figure 2
Supplementary Figure 3
Supplementary Figure 4
Supplementary Figure 5: Unique peptides identified by DeepNovo, PECAN, and Spectronaut from the plasma dataset.
Supplementary Figure 6
Supplementary Figure 7
Supplementary Figure 8
Supplementary Figure 9
Supplementary Figure 10
Supplementary Figure 11