TRIC: an automated alignment strategy for reproducible protein quantification in targeted proteomics

Journal name:
Nature Methods
Volume:
13,
Pages:
777–783
Year published:
DOI:
doi:10.1038/nmeth.3954
Received
Accepted
Published online

Abstract

Next-generation mass spectrometric (MS) techniques such as SWATH-MS have substantially increased the throughput and reproducibility of proteomic analysis, but ensuring consistent quantification of thousands of peptide analytes across multiple liquid chromatography–tandem MS (LC-MS/MS) runs remains a challenging and laborious manual process. To produce highly consistent and quantitatively accurate proteomics data matrices in an automated fashion, we developed TRIC (http://proteomics.ethz.ch/tric/), a software tool that utilizes fragment-ion data to perform cross-run alignment, consistent peak-picking and quantification for high-throughput targeted proteomics. TRIC reduced the identification error compared to a state-of-the-art SWATH-MS analysis without alignment by more than threefold at constant recall while correcting for highly nonlinear chromatographic effects. On a pulsed-SILAC experiment performed on human induced pluripotent stem cells, TRIC was able to automatically align and quantify thousands of light and heavy isotopic peak groups. Thus, TRIC fills a gap in the pipeline for automated analysis of massively parallel targeted proteomics data sets.

At a glance

Figures

  1. TRIC: alignment algorithm for targeted proteomics data.
    Figure 1: TRIC: alignment algorithm for targeted proteomics data.

    (a) Schematic of a typical targeted proteomics experiment in which runs are analyzed individually, giving rise to multiple putative peak groups per run that may not be directly mappable owing to chromatographic shifts. (b) The TRIC algorithm selects a set of high-confidence peptide 'anchor points' (black crosses) for pairwise nonlinear alignment and chromatographic distance estimation. (c) TRIC algorithm steps. First (1), an optimal guidance tree is computed on the basis of chromatographic distance computed in before (nodes represent runs; edges represent pairwise alignments). Next (2), the algorithm uses a starting point (A) to transfer identification confidence to nearby runs (3, iterations B and C) using the guidance tree. In an optional last step (4), runs without suitable peak groups are revisited to perform optional noise requantification (all fragment-ion signal at the aligned position is integrated; orange circles). (d) The confidence transfer step uses a starting peak group (top) to select a narrow region in a neighboring run, from which a peak is selected. This procedure is repeated across all runs to identify the correct peak or establish peak boundaries in runs without any analyte signal (bottom). In a real application, the alignment order may not be linear but follow the guidance tree.

  2. Identification and alignment accuracy of TRIC on a validation data set of more than 7,000 manually annotated peak groups.
    Figure 2: Identification and alignment accuracy of TRIC on a validation data set of more than 7,000 manually annotated peak groups.

    (a) Recall rate versus FDR plot to compare the performance of TRIC and the naive approach (fixed q-value cutoff applied to each run individually). As misclassified peaks cannot be recovered even at high score cutoffs, a recall of 100% cannot be reached. (b) Error rates at reported FDR cutoffs of 1% for the naive approach and TRIC without RT alignment (none), linear alignment (TRIC linear) and nonlinear k-nearest neighbor alignment (TRIC LLD). (c) Error of reported RTs plotted without (top) and with (bottom) nonlinear alignment on a sample run. (d) Cumulative fraction of peaks having less than a given error in RT plotted for TRIC with k-nearest neighbor smoothing (LLD), linear alignment and no alignment.

  3. Analysis of a data set of 12 runs of S. pyogenes exposed to human plasma using TRIC.
    Figure 3: Analysis of a data set of 12 runs of S. pyogenes exposed to human plasma using TRIC.

    (a) Comparison of data matrix occupancy using TRIC alignment versus no alignment. (b) The computed guidance tree captures information orthogonal to injection order (r.m.s. deviation between runs is indicated for each edge). Control samples are circled in blue, and plasma-exposed samples in red. The tree is substantially different from injection order, as samples were shot in three batches (R2, R3 and R4) of two biological replicates (Repl1 and Repl2). (c,d) Number of precursors appearing in a specific number of runs before (c; N = 95,685) and after (d; N = 120,348) TRIC. (e,f) Cumulative number of peptides (solid line) quantified using a fixed 0.01 q-value cutoff without alignment (e) and after applying TRIC and a minimal q-value cutoff of 0.0015 as computed by TRIC (f).

  4. Pulsed-SILAC experiment on human iPSCs.
    Figure 4: Pulsed-SILAC experiment on human iPSCs.

    (a) The RT difference between the light and heavy signal as a function of the intensity (top, distribution displaying values <104 in intensity). (b) Experimental design. A human iPSC line was exposed to a pulse of heavy amino acids (AA) and sampled at four time points in duplicate. (c) s.d. of the RT difference between heavy and light pairs (H–L) with and without TRIC alignment. For the analysis without alignment, a simple FDR cutoff was applied (naive approach). Pairs from both replicates are aggregated. No heavy–light pairs are expected at t = 0, as heavy amino acids were added afterwards. (d) Comparison of the number of isotopic SILAC pairs quantified per sample using TRIC alignment versus no alignment. For each time point, data are mean ± s.d. of two replicates.

  5. Protein turnover rates in human iPSCs.
    Figure 5: Protein turnover rates in human iPSCs.

    (a) RIA for an example protein, importin-α, with five peptides (dashed lines) with measured heavy–light ratios at three time points (open circles). Solid line represents the median of all decay curves fitted through 1.0 at t = 0 for all peptides and estimates protein-level kloss. (b) Estimation of global protein turnover rates after correction for protein dilution. (c) Turnover of proteins in the GO category 'cell adhesion' (P < 10−7). Only proteins with two or more peptides are shown (box indicates first and third quartiles; center line indicates median; whiskers extend to the most extreme data point that is no more than 1.5 times the length of the box away from the box).

References

  1. Hudson, T.J. et al. International network of cancer genome projects. Nature 464, 993998 (2010).
  2. McLendon, R. et al. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455, 10611068 (2008).
  3. Alexandrov, L.B. et al. Signatures of mutational processes in human cancer. Nature 500, 415421 (2013).
  4. Haines, J.L. et al. Complement factor H variant increases the risk of age-related macular degeneration. Science 308, 419421 (2005).
  5. International Consortium for Blood Pressure Genome-Wide Association Studies. Genetic variants in novel pathways influence blood pressure and cardiovascular disease risk. Nature 478, 103109 (2011).
  6. Röst, H.L., Malmström, L. & Aebersold, R. Reproducible quantitative proteotype data matrices for systems biology. Mol. Biol. Cell 26, 39263931 (2015).
  7. de Godoy, L.M.F. et al. Comprehensive mass-spectrometry-based proteome quantification of haploid versus diploid yeast. Nature 455, 12511254 (2008).
  8. Hebert, A.S. et al. The one hour yeast proteome. Mol. Cell. Proteomics 13, 339347 (2014).
  9. Beck, M. et al. The quantitative proteome of a human cell line. Mol. Syst. Biol. 7, 549 (2011).
  10. Nagaraj, N. et al. Deep proteome and transcriptome mapping of a human cancer cell line. Mol. Syst. Biol. 7, 548 (2011).
  11. Desiere, F. et al. The PeptideAtlas project. Nucleic Acids Res. 34, D655D658 (2006).
  12. Omenn, G.S. et al. Metrics for the Human Proteome Project 2015: progress on the human proteome and guidelines for high-confidence protein identification. J. Proteome Res. 14, 34523460 (2015).
  13. Picotti, P. et al. A complete mass-spectrometric map of the yeast proteome applied to quantitative trait analysis. Nature 494, 266270 (2013).
  14. Li, X.J. et al. A blood-based proteomic classifier for the molecular characterization of pulmonary nodules. Sci. Transl. Med. 5, 207ra142 (2013).
  15. Drabovich, A.P. et al. Differential diagnosis of azoospermia with proteomic biomarkers ECM1 and TEX101 quantified in seminal plasma. Sci. Transl. Med. 5, 212ra160 (2013).
  16. Surinova, S. et al. Prediction of colorectal cancer diagnosis based on circulating plasma proteins. EMBO Mol. Med. 7, 11661178 (2015).
  17. Surinova, S. et al. Non-invasive prognostic protein biomarker signatures associated with colorectal cancer. EMBO Mol. Med. 7, 11531165 (2015).
  18. Gillet, L.C. et al. Targeted data extraction of the MS/MS spectra generated by data-independent acquisition: a new concept for consistent and accurate proteome analysis. Mol. Cell. Proteomics 11, O111.016717 (2012).
  19. Röst, H.L. et al. OpenSWATH enables automated, targeted analysis of data-independent acquisition MS data. Nat. Biotechnol. 32, 219223 (2014).
  20. MacLean, B. et al. Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics 26, 966968 (2010).
  21. Martin, D.B. et al. MRMer, an interactive open source and cross-platform system for data extraction and visualization of multiple reaction monitoring experiments. Mol. Cell. Proteomics 7, 22702278 (2008).
  22. Mead, J.A. et al. MRMaid, the web-based tool for designing multiple reaction monitoring (MRM) transitions. Mol. Cell. Proteomics 8, 696705 (2009).
  23. Prakash, A. et al. Expediting the development of targeted SRM assays: using data from shotgun proteomics to automate method development. J. Proteome Res. 8, 27332739 (2009).
  24. Walsh, G.M. et al. Implementation of a data repository-driven approach for targeted proteomics experiments by multiple reaction monitoring. J. Proteomics 72, 838852 (2009).
  25. Sherwood, C.A. et al. MaRiMba: a software application for spectral library-based MRM transition list assembly. J. Proteome Res. 8, 43964405 (2009).
  26. Bertsch, A. et al. Optimal de novo design of MRM experiments for rapid assay development in targeted proteomics. J. Proteome Res. 9, 26962704 (2010).
  27. Reiter, L. et al. mProphet: automated data processing and statistical validation for large-scale SRM experiments. Nat. Methods 8, 430435 (2011).
  28. Teleman, J. et al. Automated selected reaction monitoring software for accurate label-free protein quantification. J. Proteome Res. 11, 37663773 (2012).
  29. Prakash, A. et al. Signal maps for mass spectrometry-based comparative proteomics. Mol. Cell. Proteomics 5, 423432 (2006).
  30. Mueller, L.N. et al. SuperHirn—a novel tool for high resolution LC-MS-based peptide/protein profiling. Proteomics 7, 34703480 (2007).
  31. Elias, J.E. & Gygi, S.P. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods 4, 207214 (2007).
  32. Reiter, L. et al. Protein identification false discovery rates for very large proteomics data sets generated by tandem mass spectrometry. Mol. Cell. Proteomics 8, 24052417 (2009).
  33. Doherty, M.K., Hammond, D.E., Clague, M.J., Gaskell, S.J. & Beynon, R.J. Turnover of the human proteome: determination of protein intracellular stability by dynamic SILAC. J. Proteome Res. 8, 104112 (2009).
  34. Schwanhäusser, B. et al. Global quantification of mammalian gene expression control. Nature 473, 337342 (2011).
  35. Reinstein, E. & Ciechanover, A. Narrative review: protein degradation and human diseases: the ubiquitin connection. Ann. Intern. Med. 145, 676684 (2006).
  36. Pratt, J.M. et al. Dynamics of protein turnover, a missing dimension in proteomics. Mol. Cell. Proteomics 1, 579591 (2002).
  37. Eden, E., Navon, R., Steinfeld, I., Lipson, D. & Yakhini, Z. GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC Bioinformatics 10, 48 (2009).
  38. Teleman, J. et al. DIANA—algorithmic improvements for analysis of data-independent acquisition MS data. Bioinformatics 31, 555562 (2015).
  39. Prince, J.T. & Marcotte, E.M. Chromatographic alignment of ESI-LC-MS proteomics data sets by ordered bijective interpolated warping. Anal. Chem. 78, 61406152 (2006).
  40. Kohlbacher, O. et al. TOPP–the OpenMS proteomics pipeline. Bioinformatics 23, e191e197 (2007).
  41. Sturm, M. et al. OpenMS—an open-source software framework for mass spectrometry. BMC Bioinformatics 9, 163 (2008).
  42. Cox, J. & Mann, M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 26, 13671372 (2008).
  43. Liu, Y. et al. Quantitative variability of 342 plasma proteins in a human twin population. Mol. Syst. Biol. 11, 786 (2015).
  44. Tsou, C.-C. et al. DIA-Umpire: comprehensive computational framework for data-independent acquisition proteomics. Nat. Methods 12, 258264 (2015).
  45. Adamo, A. et al. 7q11.23 dosage-dependent dysregulation in human pluripotent stem cells affects transcriptional programs in disease-relevant lineages. Nat. Genet. 47, 132141 (2015).
  46. Kim, S.C. et al. A clean, more efficient method for in-solution digestion of protein mixtures without detergent or urea. J. Proteome Res. 5, 34463452 (2006).
  47. Escher, C. et al. Using iRT, a normalized retention time for more targeted measurement of peptides. Proteomics 12, 11111121 (2012).
  48. Liu, Y. et al. Glycoproteomic analysis of prostate cancer tissues by SWATH mass spectrometry discovers N-acylethanolamine acid amidase and protein tyrosine kinase 7 as signatures for tumor aggressiveness. Mol. Cell. Proteomics 13, 17531768 (2014).
  49. Collins, B.C. et al. Quantifying protein interaction dynamics by SWATH mass spectrometry: application to the 14-3-3 system. Nat. Methods 10, 12461253 (2013).
  50. Kessner, D., Chambers, M., Burke, R., Agus, D. & Mallick, P. ProteoWizard: open source software for rapid proteomics tools development. Bioinformatics 24, 25342536 (2008).
  51. Craig, R. & Beavis, R.C. TANDEM: matching proteins with tandem mass spectra. Bioinformatics 20, 14661467 (2004).
  52. Geer, L.Y. et al. Open mass spectrometry search algorithm. J. Proteome Res. 3, 958964 (2004).
  53. Kunszt, P. et al. iPortal: the swiss grid proteomics portal: requirements and new features based on experience and usability considerations. Concurr. Comput. 27, 433445 (2015).
  54. Keller, A., Eng, J., Zhang, N., Li, X.J. & Aebersold, R. A uniform proteomics MS/MS analysis platform utilizing open XML file formats. Mol. Syst. Biol. 1, 2005.0017 (2005).
  55. Keller, A., Nesvizhskii, A.I., Kolker, E. & Aebersold, R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 74, 53835392 (2002).
  56. Shteynberg, D. et al. iProphet: multi-level integrative analysis of shotgun proteomic data improves peptide and protein identification rates and error estimates. Mol. Cell. Proteomics 10, M111.007690 (2011).
  57. Schubert, O.T. et al. Building high-quality assay libraries for targeted analysis of SWATH MS data. Nat. Protoc. 10, 426441 (2015).
  58. Lam, H. et al. Development and validation of a spectral library searching method for peptide identification from MS/MS. Proteomics 7, 655667 (2007).
  59. Röst, H.L., Aebersold, R. & Schubert, O. Automated SWATH data analysis using targeted extraction of ion chromatograms. Preprint at. bioRxiv http://dx.doi.org/10.1101/044552 (2016).

Download references

Author information

Affiliations

  1. Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland.

    • Hannes L Röst,
    • Yansheng Liu,
    • Pedro Navarro,
    • George Rosenberger,
    • Ben C Collins,
    • Ludovic Gillet,
    • Lars Malmström &
    • Ruedi Aebersold
  2. Department of Genetics, Stanford University, Stanford, California, USA.

    • Hannes L Röst
  3. Department of Experimental Oncology, European Institute of Oncology, Milan, Italy.

    • Giuseppe D'Agostino,
    • Matteo Zanella &
    • Giuseppe Testa
  4. Institute for Immunology, University Medical Center of the Johannes Gutenberg University of Mainz, Mainz, Germany.

    • Pedro Navarro
  5. PhD Program in Systems Biology, University of Zurich and ETH Zurich, Zurich, Switzerland.

    • George Rosenberger
  6. Department of Oncology and Hemato-Oncology, University of Milan, Milan, Italy.

    • Giuseppe Testa
  7. S3IT, University of Zurich, Zurich, Switzerland.

    • Lars Malmström
  8. Faculty of Science, University of Zurich, Zurich, Switzerland.

    • Ruedi Aebersold

Contributions

H.L.R. designed and wrote code, performed data analysis and produced the figures. Y.L., G.D. and M.Z. performed iPSC experiments and acquired MS data. P.N. and G.R. contributed to code and provided an initial prototype of the implementation. B.C.C. and L.G. acquired MS data and gave critical input. G.T., L.M. and R.A. designed and supervised the study. All authors contributed to writing the manuscript.

Competing financial interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (2,802 KB)

    Supplementary Notes 1–7

Zip files

  1. Supplementary Software (1,023 KB)

    msproteomicstools 0.4.3

Other

  1. Supplementary Table 1 (5,888 KB)

    Result table describing the manually picked peptides with retention times and peak boundaries.

  2. Supplementary Table 2 (140 KB)

    Significant proteins from the S. pyogenes analysis without any alignment.

  3. Supplementary Table 3 (166 KB)

    Significant proteins from the S. pyogenes analysis with TRIC alignment.

  4. Supplementary Table 4 (569 KB)

    Degradation rates for the iPSC as determined by SWATH-MS analysis on the peptide level.

  5. Supplementary Table 5 (89 KB)

    Degradation rates for the iPSCs as determined by SWATH-MS analysis on the protein level.

Excel files

  1. Supplementary Table 6 (32 KB)

    GO-enrichment analysis using Gorilla.

Additional data