Label-free, normalized quantification of complex mass spectrometry data for proteomic analysis

Journal name:
Nature Biotechnology
Year published:
Published online

Replicate mass spectrometry (MS) measurements and the use of multiple analytical methods can greatly expand the comprehensiveness of shotgun proteomic profiling of biological samples1, 2, 3, 4, 5. However, the inherent biases and variations in such data create computational and statistical challenges for quantitative comparative analysis6. We developed and tested a normalized, label-free quantitative method termed the normalized spectral index (SIN), which combines three MS abundance features: peptide count, spectral count and fragment-ion (tandem MS or MS/MS) intensity. SIN largely eliminated variances between replicate MS measurements, permitting quantitative reproducibility and highly significant quantification of thousands of proteins detected in replicate MS measurements of the same and distinct samples. It accurately predicts protein abundance more often than the five other methods we tested. Comparative immunoblotting and densitometry further validate our method. Comparative quantification of complex data sets from multiple shotgun proteomics measurements is relevant for systems biology and biomarker discovery.

At a glance


  1. Statistical analysis of replicate MS measurement variation before and after normalization.
    Figure 1: Statistical analysis of replicate MS measurement variation before and after normalization.

    (ac) The mean and 95% confidence interval (CI) for the abundance features, PN (a), SC (b) and SI (c) were calculated for four MS replicate measurements of pooled endothelial cell plasma membrane isolated from liver and were plotted using the mean diamonds and comparison circles methods. If the CIs, as indicated by the diamonds, do not overlap, the groups are significantly different. For statistical analysis of difference in mean intensities or other features, between multiple replicate samples, one-way ANOVA was performed. Our null hypothesis was that all replicate samples were equal. If our null hypothesis is true then we expect the F-ratio to be ~1 (d.f. = 5,919). Our significance level was P < 0.05. The x axis represents each of the four replicate data sets, and the y axis represents the log of abundance feature being examined (n = 5,923). (dl) The indicated normalization methods were applied separately to the SI (dk) or SC (l) data sets and tested for differences as described above. (m,n) We applied NSAF20 and Rsc21 methods to the replicate data sets (see Online Methods for equations) and tested for differences. (n) Graph of the comparison of F-ratios obtained from statistical testing of SIN, Rsc and RSI, where RSI is the Rsc equation with SI substituted for SC.

  2. Correlation of SIN with protein abundance.
    Figure 2: Correlation of SIN with protein abundance.

    (a) BSA was spiked with a protein mix containing 19 standard proteins spanning a wide dynamic range (0.5–50,000 fmol) (Online Methods), which was separated by SDS-PAGE, trypsin digested and analyzed by two-dimensional LC. SIN values for each spiked protein were calculated, averaged and plotted against the amount of the protein standard added. Due to the large range in protein abundance, many of the data points cluster close to the origin, thus this region was magnified for ease of visualization. The R2 correlation was 0.9239. (bd) Statistical analysis comparing the quantification of proteins across replicate measurements using six quantification methods (relative to known value). The mean and 95% CI for protein abundance, as determined by various relative quantitative methods, were plotted for three representative proteins from a standard protein mixture23 and compared to the actual loaded amount using ANOVA. Individual means were compared using the Tukey-Kramer honestly significant difference (HSD method33, 34). Quantitative methods that were not significantly different from the actual protein abundance (ANOVA, α−significance level = 0.05) are highlighted in red.

  3. Statistical analysis of normalization methods applied to variable protein load and distinct sample data sets.
    Figure 3: Statistical analysis of normalization methods applied to variable protein load and distinct sample data sets.

    The indicated normalization methods were each applied to the 40-μg and 150-μg MS data sets from normal lung endothelial cell plasma membranes. (ae) Mean and 95% CI for a raw SI data set (a) and data sets normalized by the dilution factor (b), SIN (c), Rsc (d) and NSAF (e) were plotted using the mean diamonds and comparison circles. The x axis represents the two different protein loads, and the y axis represents the log of the normalized abundance feature (number of common proteins, n = 2,660). (f) T-ratios for statistical testing of SIN and Rsc are plotted as a function of peptide cut-off numbers (number of peptides and/or proteins commonly identified between the samples). The α = 0.05 significance line is plotted in gray. T-ratios above this line indicate that samples are different. (g) The SIN values converted to estimated nanogram amounts (based on initial sample load) for 2,660 proteins common between the 40 μg and 150 μg data sets were plotted against each other. The slope of the line is 3.72, R2 = 0.94. (h) Two-way clustering of ~3,000 proteins identified in heart and kidney endothelial cell plasma membrane samples. Each column in the matrix represents a single two-dimensional LC-MS/MS run for either heart or kidney, based on the SIN normalized MS data. Proteins (rows) and tissues (columns) are clustered based on their similarities in protein intensity profile. Colors within the heatmap range from light blue (least prevalent) to dark red (most prevalent), illustrating the relative abundance of each protein within a particular sample.

  4. Comparative analysis of proteins quantified by SDS-PAGE and MS analysis.
    Figure 4: Comparative analysis of proteins quantified by SDS-PAGE and MS analysis.

    (a) Proteins in endothelial cell plasma membranes from rat lung were separated by SDS-PAGE, stained with Coomassie blue and cut into 51 slices. Each gel slice was subjected to densitometry and MS analysis. (b) The densitometry intensities for each slice were compared to the SI on the same axis, with the x axis being the gel slice number. (c) Sixty-four proteins found in both lung endothelial cell plasma membranes (P), and the entire lung homogenate (H) were analyzed by western blot analysis to quantify protein signal by densitometry. The P/H ratio for each protein from the western analysis is plotted against its P/H ratio from the SIN values (multiple measurements). Spearman's Rho correlation between western and SIN ratio is ρ = 0.86 and all the points fall within 95% CI (red line). (d) The Bland-Altman plot for the two methods with 1 and 2 s.d. of the mean.


  1. Durr, E. et al. Direct proteomic mapping of the lung microvascular endothelial cell surface in vivo and in cell culture. Nat. Biotechnol. 22, 985992 (2004).
  2. Li, Y. et al. Enhancing identifications of lipid-embedded proteins by mass spectrometry for improved mapping of endothelial plasma membranes in vivo. Mol. Cell. Proteomics 8, 12191235 (2009).
  3. Oh, P. et al. Subtractive proteomic mapping of the endothelial surface in lung and solid tumours for tissue-specific therapy. Nature 429, 629635 (2004).
  4. Slebos, R.J. et al. Evaluation of strong cation exchange versus isoelectric focusing of peptides for multidimensional liquid chromatography-tandem mass spectrometry. J. Proteome Res. 7, 52865294 (2008).
  5. Kislinger, T., Gramolini, A.O., MacLennan, D.H. & Emili, A. Multidimensional protein identification technology (MudPIT): technical overview of a profiling method optimized for the comprehensive proteomic investigation of normal and diseased heart tissue. J. Am. Soc. Mass Spectrom. 16, 12071220 (2005).
  6. Wong, J.W., Sullivan, M.J. & Cagney, G. Computational methods for the comparative quantification of proteins in label-free LCn-MS experiments. Brief. Bioinform. 9, 156165 (2008).
  7. Oh, P. et al. Live dynamic imaging of caveolae pumping targeted antibody rapidly and specifically across endothelium in the lung. Nat. Biotechnol. 25, 327337 (2007).
  8. Shiio, Y. et al. Quantitative proteomic analysis of Myc oncoprotein function. EMBO J. 21, 50885096 (2002).
  9. Shiio, Y. et al. Quantitative proteomic analysis of myc-induced apoptosis: a direct role for Myc induction of the mitochondrial chloride ion channel, mtCLIC/CLIC4. J. Biol. Chem. 281, 27502756 (2006).
  10. Chiang, M.C. et al. Systematic uncovering of multiple pathways underlying the pathology of Huntington disease by an acid-cleavable isotope-coded affinity tag approach. Mol. Cell. Proteomics 6, 781797 (2007).
  11. Service, R.F. Proteomics. Proteomics ponders prime time. Science 321, 17581761 (2008).
  12. Service, R.F. Proteomics. Will biomarkers take off at last? Science 321, 1760 (2008).
  13. Kolodziej, E.P., Gray, J.L. & Sedlak, D.L. Quantification of steroid hormones with pheromonal properties in municipal wastewater effluent. Environ. Toxicol. Chem. 22, 26222629 (2003).
  14. Wolf-Yadlin, A., Hautaniemi, S., Lauffenburger, D.A. & White, F.M. Multiple reaction monitoring for robust quantitative proteomic analysis of cellular signaling networks. Proc. Natl. Acad. Sci. USA 104, 58605865 (2007).
  15. Kuhn, E. et al. Quantification of C-reactive protein in the serum of patients with rheumatoid arthritis using multiple reaction monitoring mass spectrometry and 13C-labeled peptide standards. Proteomics 4, 11751186 (2004).
  16. Ross, P.L. et al. Multiplexed protein quantitation in Saccharomyces cerevisiae using amine-reactive isobaric tagging reagents. Mol. Cell. Proteomics 3, 11541169 (2004).
  17. Koziol, J.A., Feng, A.C. & Schnitzer, J.E. Application of capture-recapture models to estimation of protein count in MudPIT experiments. Anal. Chem. 78, 32033207 (2006).
  18. Ishihama, Y. et al. Exponentially modified protein abundance index (emPAI) for estimation of absolute protein amount in proteomics by the number of sequenced peptides per protein. Mol. Cell. Proteomics 4, 12651272 (2005).
  19. Rappsilber, J., Ryder, U., Lamond, A.I. & Mann, M. Large-scale proteomic analysis of the human spliceosome. Genome Res. 12, 12311245 (2002).
  20. Zybailov, B. et al. Statistical analysis of membrane proteome expression changes in Saccharomyces cerevisiae . J. Proteome Res. 5, 23392347 (2006).
  21. Old, W.M. et al. Comparison of label-free methods for quantifying human proteins by shotgun proteomics. Mol. Cell. Proteomics 4, 14871502 (2005).
  22. Silva, J.C. et al. Absolute quantification of proteins by LCMSE: a virtue of parallel MS acquisition. Mol. Cell. Proteomics 5, 144156 (2006).
  23. Klimek, J. et al. The standard protein mix database: a diverse data set to assist in the production of improved Peptide and protein identification software tools. J. Proteome Res. 7, 96103 (2008).
  24. Bland, J.M. & Altman, D.G. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1, 307310 (1986).
  25. Callister, S.J. et al. Normalization approaches for removing systematic biases associated with mass spectrometry and label-free proteomics. J. Proteome Res. 5, 277286 (2006).
  26. Wang, W. et al. Quantification of proteins and metabolites by mass spectrometry without isotopic labeling or spiked standards. Anal. Chem. 75, 48184826 (2003).
  27. Lukas, T.J. et al. Informatics-assisted protein profiling in a transgenic mouse model of amyotrophic lateral sclerosis. Mol. Cell. Proteomics 5, 12331244 (2006).
  28. Forner, F. et al. Quantitative proteomic comparison of rat mitochondria from muscle, heart, and liver. Mol. Cell. Proteomics 5, 608619 (2006).
  29. Choi, H., Fermin, D. & Nesvizhskii, A.I. Significance analysis of spectral count data in label-free shotgun proteomics. Mol. Cell. Proteomics 7, 23732385 (2008).
  30. Baggerly, K.A. et al. A comprehensive approach to the analysis of matrix-assisted laser desorption/ionization-time of flight proteomics spectra from serum samples. Proteomics 3, 16671672 (2003).
  31. Wagner, M., Naik, D. & Pothen, A. Protocols for disease classification from mass spectrometry data. Proteomics 3, 16921698 (2003).
  32. Anderle, M. et al. Quantifying reproducibility for differential proteomics: noise analysis for protein liquid chromatography-mass spectrometry of human serum. Bioinformatics 20, 35753582 (2004).
  33. Kramer, C.Y. Extension of multiple range tests to group means with unequal numbers of replications. Biometrics 12, 309310 (1956).
  34. Tukey, J.W. Some selected quick and easy methods of statistical analysis. Trans. N.Y. Acad. Sci. 16, 8897 (1953).
  35. Oh, P. & Schnitzer, J.E. Isolation and subfractionation of plasma membranes to purify caveolae separately from glycosyl-phosphatidylinositol-anchored protein microdomain. in Cell Biology: A Laboratory Handbook (ed. C.J.) 3436 (Academic Press, Orlando, FL, USA, 1998).
  36. Schnitzer, J.E. et al. Separation of caveolae from associated microdomains of GPI-anchored proteins. Science 269, 14351439 (1995).
  37. Beissbarth, T. et al. Statistical modeling of sequencing errors in SAGE libraries. Bioinformatics 20 Suppl 1, i31i39 (2004).
  38. Kendall, M. Multivariate Analysis, edn. 2 (Macmillan, New York, 1980).
  39. Mirkin, B. Mathematical Classification and Clustering (Kluwer Academic Publishers, Dordrecht, The Netherlands; 1996).
  40. Cheng, Y. & Church, G.M. in Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology 93-1032000, August 19–23, 2000 (AAAI Press, Menlo Park, CA, 2000).
  41. Hartigan, J. Direct clustering of a data matrix. J. Amer. Stat. Assoc. 67, 123129 (1972).

Download references

Author information


  1. Proteogenomics Research Institute for Systems Medicine, San Diego, California, USA.

    • Noelle M Griffin,
    • Jingyi Yu,
    • Fred Long,
    • Phil Oh,
    • Sabrina Shore,
    • Yan Li &
    • Jan E Schnitzer
  2. Sidney Kimmel Cancer Center, San Diego, California, USA.

    • Noelle M Griffin,
    • Jingyi Yu,
    • Fred Long,
    • Phil Oh,
    • Sabrina Shore,
    • Yan Li &
    • Jan E Schnitzer
  3. The Scripps Research Institute, La Jolla, California, USA.

    • Jim A Koziol


N.M.G. designed, developed and analyzed the methods, provided some of the mass spectrometry data, performed the spiking experiments and analysis and wrote the manuscript; J.Y. initiated the project, designed, tested and implemented the methods; F.L. developed the scripts for data extraction; P.O. performed western blot analysis and densitometry; S.S. performed western blot analysis; Y.L. provided key mass spectrometry data; J.A.K. provided direction for statistical analysis; J.E.S supervised the project, designed specific tests and helped to write the manuscript. All authors have read and agreed to all the content in this manuscript.

Corresponding author

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (188K)

    Supplementary Figs. 1–7, Supplementary Table 1, Supplementary Notes, Supplementary Methods, Supplementary Data and Supplementary Discussion

Additional data