Data analysis of assorted serum peptidome profiles


Discovery of biomarker patterns using proteomic techniques requires examination of large numbers of patient and control samples, followed by data mining of the molecular read-outs (e.g., mass spectra). Adequate signal processing and statistical analysis are critical for successful extraction of markers from these data sets. The protocol, specifically designed for use in conjunction with MALDI-TOF-MS-based serum peptide profiling, is a data analysis pipeline, starting with transfer of raw spectra that are interpreted using signal processing algorithms to define suitable features (i.e., peptides). We describe an algorithm for minimal entropy-based peak alignment across samples. Peak lists obtained in this way, and containing all samples, all peptide features and their normalized MS-ion intensities, can be evaluated, and results validated, using common statistical methods. We recommend visual inspection of the spectra to confirm all results, and have written freely available software for viewing and color-coding of spectral overlays.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: MSKCC data analysis pipeline for serum peptidomics.
Figure 2: Data folder structure and naming convention used by the MSKCC serum proteomics data analysis.
Figure 3: Parameter file for signal processing.
Figure 4: Qcealignf workflow.
Figure 5: Data import and interpretation for GeneSpring.
Figure 6: Unsupervised statistical analysis.
Figure 7: Supervised statistical analysis.
Figure 8: MSV.
Figure 9: Optimization of the singlet width.
Figure 10: Effects of the singlet-width parameter on ion intensity and resolution.
Figure 11: Effects of mass calibration and 'Entropycal'-based alignment on mass spectral overlays.
Figure 12


  1. 1

    Qian, W.J., Jacobs, J.M., Liu, T., Camp, D.G. & Smith, R.D., 2nd . Advances and challenges in liquid chromatography–mass spectrometry-based proteomics profiling for clinical applications. Mol. Cell Proteomics 5, 1727–1744 (2006).

    CAS  Article  Google Scholar 

  2. 2

    Villanueva, J. et al. Differential exoprotease activities confer tumor-specific serum peptidome patterns. J. Clin. Invest. 116, 271–284 (2006).

    CAS  Article  Google Scholar 

  3. 3

    Villanueva, J. et al. Serum peptidome patterns that distinguish metastatic thyroid carcinoma from cancer-free controls are unbiased by gender and age. Mol. Cell Proteomics 5, 1840–1852 (2006).

    CAS  Article  Google Scholar 

  4. 4

    Issaq, H.J., Conrads, T.P., Prieto, D.A., Tirumalai, R. & Veenstra, T.D. SELDI-TOF MS for diagnostic proteomics. Anal. Chem. 75, 148A–155A (2003).

    CAS  Article  Google Scholar 

  5. 5

    Petricoin, E.F. & Liotta, L.A. SELDI-TOF-based serum proteomic pattern diagnostics for early detection of cancer. Curr. Opin. Biotechnol. 15, 24–30 (2004).

    CAS  Article  Google Scholar 

  6. 6

    Koomen, J.M. et al. Direct tandem mass spectrometry reveals limitations in protein profiling experiments for plasma biomarker discovery. J. Proteome Res. 4, 972–981 (2005).

    CAS  Article  Google Scholar 

  7. 7

    Richter, R. et al. Composition of the peptide fraction in human blood plasma: database of circulating human peptides. J. Chromatogr. B 726, 25–35 (1999).

    CAS  Article  Google Scholar 

  8. 8

    Gao, J., Opiteck, G.J., Friedrichs, M.S., Dongre, A.R. & Hefta, S.A. Changes in the protein expression of yeast as a function of carbon source. J. Proteome Res. 2, 643–649 (2003).

    CAS  Article  Google Scholar 

  9. 9

    Fach, E.M. et al. In vitro biomarker discovery for atherosclerosis by proteomics. Mol. Cell Proteomics 3, 1200–1210 (2004).

    CAS  Article  Google Scholar 

  10. 10

    Wang, W. et al. Quantification of proteins and metabolites by mass spectrometry without isotopic labeling or spiked standards. Anal. Chem. 75, 4818–4826 (2003).

    CAS  Article  Google Scholar 

  11. 11

    Li, X.J. et al. A tool to visualize and evaluate data obtained by liquid chromatography-electrospray ionization-mass spectrometry. Anal. Chem. 76, 3856–3860 (2004).

    CAS  Article  Google Scholar 

  12. 12

    Chen, S.S. et al. Improving mass and liquid chromatography based identification of proteins using bayesian scoring. J. Proteome Res. 4, 2174–2184 (2005).

    CAS  Article  Google Scholar 

  13. 13

    Silva, J.C. et al. Quantitative proteomic analysis by accurate mass retention time pairs. Anal. Chem. 77, 2187–2200 (2005).

    CAS  Article  Google Scholar 

  14. 14

    Jaitly, N. et al. Robust algorithm for alignment of liquid chromatography–mass spectrometry analyses in an accurate mass and time tag data analysis pipeline. Anal. Chem. 78, 7397–7409 (2006).

    CAS  Article  Google Scholar 

  15. 15

    Wang, P. et al. A statistical method for chromatographic alignment of LC–MS data. Biostatistics (2006).

  16. 16

    Gillette, M.A., Mani, D.R. & Carr, S.A. Place of pattern in proteomic biomarker discovery. J. Proteome Res. 4, 1143–1154 (2005).

    CAS  Article  Google Scholar 

  17. 17

    Adam, B.L. et al. Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men. Cancer Res. 62, 3609–3614 (2002).

    CAS  PubMed  Google Scholar 

  18. 18

    Yanagisawa, K. et al. Proteomic patterns of tumour subsets in non-small-cell lung cancer. Lancet 362, 433–439 (2003).

    CAS  Article  Google Scholar 

  19. 19

    Tibshirani, R. et al. Sample classification from protein mass spectrometry, by “peak probability contrasts”. Bioinformatics 20, 3034–3044 (2004).

    CAS  Article  Google Scholar 

  20. 20

    Villanueva, J. et al. Serum peptide profiling by magnetic particle-assisted, automated sample processing and MALDI-TOF mass spectrometry. Anal. Chem. 76, 1560–1570 (2004).

    CAS  Article  Google Scholar 

  21. 21

    Villanueva, J., Lawlor, K., Toledo-Crow, R. & Tempst, P. Automated serum peptide profiling. Nat. Prot. 1, 880–891 (2006).

    CAS  Article  Google Scholar 

  22. 22

    DeNoyer, L. & Dodd, J. Smoothing and Derivatives in Spectroscopy. Vol. 3 (John Wiley and Sons, Chichester, UK, 2002).

    Google Scholar 

  23. 23

    Bylund, D., Danielsson, R., Malmquist, G. & Markides, K.E. Chromatographic alignment by warping and dynamic programming as a pre-processing tool for PARAFAC modelling of liquid chromatography–mass spectrometry data. J. Chromatogr. A 961, 237–244 (2002).

    CAS  Article  Google Scholar 

  24. 24

    Villanueva, J. et al. Correcting common errors in identifying cancer-specific serum peptide signatures. J. Proteome Res. 4, 1060–1072 (2005).

    CAS  Article  Google Scholar 

  25. 25

    Shannon, C.E. A mathematical theory of communication. Bell System Tech. J. 27, 379–423 (1948).

    Article  Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Paul Tempst.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Method 1

Mass Spectra Analysis (PDF 447 kb)

Supplementary Method 2

Customized macro to convert Bruker files to ascii files (ZIP 2 kb)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Villanueva, J., Philip, J., DeNoyer, L. et al. Data analysis of assorted serum peptidome profiles. Nat Protoc 2, 588–602 (2007).

Download citation

Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing