PRIDE Inspector: a tool to visualize and validate MS proteomics data

Wang, Rui; Fabregat, Antonio; Ríos, Daniel; Ovelleiro, David; Foster, Joseph M; Côté, Richard G; Griss, Johannes; Csordas, Attila; Perez-Riverol, Yasset; Reisinger, Florian; Hermjakob, Henning; Martens, Lennart; Vizcaíno, Juan Antonio

doi:10.1038/nbt.2112

Correspondence
Published: 08 February 2012

PRIDE Inspector: a tool to visualize and validate MS proteomics data

Rui Wang¹,
Antonio Fabregat¹,
Daniel Ríos¹,
David Ovelleiro¹,
Joseph M Foster¹,
Richard G Côté¹,
Johannes Griss^1,2,
Attila Csordas¹,
Yasset Perez-Riverol^1,3,
Florian Reisinger¹,
Henning Hermjakob¹,
Lennart Martens^4,5 &
…
Juan Antonio Vizcaíno¹

Nature Biotechnology volume 30, pages 135–137 (2012)Cite this article

4445 Accesses
101 Citations
9 Altmetric
Metrics details

Subjects

You have full access to this article via your institution.

Download PDF

To the Editor:

Your editorial “Credit where credit is overdue”¹ aptly summarized the existing situation in the proteomics field, where full data disclosure remains very much a work in progress. Importantly, it also correctly pointed out that “the software provided by the public repositories for searching and analyzing proteomics data is not as efficient and as user friendly as it could be.” In this context, we introduce to readers PRIDE Inspector (http://code.google.com/p/pride-toolsuite/wiki/PRIDEInspector), a user-friendly, freely available open-source software tool that allows the user to efficiently browse and visualize mass spectrometry (MS) proteomics data. One of the key features of PRIDE Inspector is that it allows the user to carry out an initial assessment on data quality and reliability. PRIDE Inspector can thus be used by researchers before they submit their data, by journal editors and peer reviewers during the manuscript review process and by any interested user in the field after public release of the data in PRIDE (the PRoteomics IDEntifications database; Fig. 1).

**Figure 1: PRIDE Inspector helps to perform every stage of the PRIDE submission workflow.**

Despite the increasing popularity of MS-based proteomics and the overall tendency in the life sciences toward open sharing of biological data, relatively little proteomics data are currently available in the public domain. This situation is changing, however, thanks to stricter data-sharing guidelines by scientific journals and funding agencies. Some proteomics journals (e.g., Proteomics and Molecular and Cellular Proteomics; MCP) recommend, and in some concrete cases mandate, public deposition of MS data in support of manuscripts. Journals from the Nature group also strongly recommend submission of proteomics data to repositories like PRIDE², PeptideAtlas³ and Tranche⁴ (http://www.nature.com/authors/policies/availability.html).

Nevertheless, in practical terms, this public data-sharing policy can succeed only if reliable and user-friendly software tools exist to streamline the submission task. Therefore, the PRIDE Converter⁵ application (http://code.google.com/p/pride-converter) was developed for data submissions to the PRIDE database². Not only has PRIDE Converter rapidly become the most popular data submission path for PRIDE (accounting for 77% of all PRIDE experiments submitted since January 2009), its release also corresponded to the start of a very substantial increase in the amount of deposited data in PRIDE (Supplementary Fig. 1). Of course, the availability of data in public repositories is only a first step. The interpretation and validation of proteomics data remain controversial, especially for cases where proteins have been identified on the basis of one unique peptide-to-spectrum match, or if post-translational modifications (PTMs) are reported. The ability to inspect and validate reported results during the review process, as well as after publication, is therefore of paramount importance. Because of the amount of data involved, such inspections can be undertaken efficiently only with the help of suitable software tools that combine ease of access with effective visualizations.

Although viewers for MS proteomics data are already available^6,7, they tend to suffer from different types of limitations. They may have been developed around a single proprietary and/or unique data format, fail to properly handle the very large files that are routinely produced, have only limited visualization and analysis functionality or be costly to license for smaller groups or individuals. We therefore developed PRIDE Inspector as a very user-friendly, freely available tool to browse, inspect and analyze proteomics data from the PRIDE repository or other data presented in standard formats.

PRIDE Inspector is a stand-alone Graphical User Interface (GUI) written in Java. It is released under the Apache2 open-source license and can be freely downloaded. Furthermore, PRIDE Inspector can also be started through a direct web link from the PRIDE homepage (http://www.ebi.ac.uk/pride). The main features of PRIDE Inspector are listed in the Supplementary Notes, along with a description of its overall software architecture and other technical details.

PRIDE Inspector supports fast loading of PRIDE XML and mzML⁸ (the community data standard for MS data) files, and it provides direct access to all public PRIDE data through a direct MySQL database connection. Moreover, this software includes an automated data download capability for private PRIDE experiments that allows journal editors and peer reviewers with the correct log-in credentials to assess the relevant experiment(s) during peer review. In addition, the Web Start version available at the PRIDE homepage adds the ability to start the application and access a particular data set through a simple URL.

PRIDE Inspector presents different views to the users, each focusing on a specific aspect of the data (Fig. 2). Depending on the type of information available for a file format or PRIDE data set, some views can remain inactive (Supplementary Fig. 2). For that reason, an 'Experiment Summary' overview window is available in the bottom left part of the GUI. A context-sensitive 'Help' function is also included, providing tailored documentation for the current view. Currently, there are six views available in PRIDE Inspector. First, the 'Overview' tab, which includes easily readable, uniform experimental metadata. The precise information displayed can vary slightly depending on the file format used and is split into three different views: 'Experiment General', 'Sample and Protocol' and 'Instrument and Processing' (Supplementary Figs. 3–6).

**Figure 2: Screenshots showing some of the graphical features of PRIDE Inspector.**

The second view concerns proteins (Supplementary Figs. 7 and 8) and is possibly the most interesting view for biologists. For each identified protein, peptides, PTMs and corresponding spectra are displayed in a concise manner. Metadata related to protein identification (e.g., as search engine or search database) are also provided here. A powerful spectrum viewer is available as well, including an automatic annotation of the spectra based on submitted fragment ions. Combinations of up to three amino acids are indicated next to the mass differences between consecutive peaks (Supplementary Figs. 7 and 9).

PRIDE Inspector also accesses some of the most popular protein databases (UniProtKB, UniParc, IPI (International Protein Index), Ensembl and NCBI nr database) by means of a web service to retrieve the most up-to-date protein sequences and names for the reported identifiers. Using the PRIDE Inspector sequence viewer (Supplementary Figs. 8 and 11), it is possible to highlight different features in the protein sequence, such as identified peptides and PTMs. The updated status of the protein identifier in the database (active, deleted, changed, unknown, merged or demerged; see Supplementary Notes) is also provided, which can affect the reliability of the protein identification. In fact, it is then possible to find peptides that originally matched the sequence of the identified protein, but that no longer match the most recent version of the sequence in the database.

The third view then focuses on the peptide identifications themselves. Metadata, such as peptide score (adapted for the search engine used) and observed PTMs, are displayed for each peptide (Supplementary Figs. 10 and 11). In both protein and peptide views, the difference between experimental and theoretical mass-over-charge ratio (delta m/z) is calculated for each peptide precursor and highlighted in the application, which can be useful as an indication for errors or inconsistencies. For both views, it is also possible to filter out the decoy matches and, as such, a straightforward estimation of the peptide false-discovery rate is also provided.

The fourth view is aimed at accessing and visualizing all spectra in the data set, not only the identified ones (Supplementary Fig. 12). For mzML files, chromatograms are displayed here as well (Supplementary Fig. 13). Submitted metadata (e.g., precursor m/z and intensity) are shown for each entry, along with calculated information, such as the number of peaks or the total peak intensity. Manual annotation of spectra is supported as well for quick de novo sequencing.

In its fifth view, PRIDE Inspector provides a collection of summary charts for assessing the overall properties of the data set. At the time of writing, up to eight different charts can be generated per data set, depending on the information available (Supplementary Figs. 14–18). These simple and easily understandable charts can provide a quick overview on data quality and reliability. Importantly, information in the spectrum-related charts can be shown for identified, unidentified or all spectra. Each chart is documented thoroughly in the supplementary information.

Finally, a sixth tab focuses on the quantification information, where available (Supplementary Fig. 19). This kind of data is currently only present in a small number of PRIDE submissions, but it is expected to become more and more popular. Apart from visualizing the quantification values for both protein and peptides, it is also possible to generate histograms where the expression values of up to ten proteins can be compared. Sample metadata for each reagent can also be easily visualized. Ratios can always be recalculated if the user decides to change the control sample.

Apart from the six main tabs, the 'Search PRIDE' panel gives access to all public data in PRIDE. It is then easy to search for particular experiments filtering by different types of metadata (Supplementary Figs. 20 and 21). In addition to data visualization and analysis functionality, PRIDE Inspector also provides various data export options (Supplementary Fig. 22). First of all, all spectra can be exported to Mascot Generic Format (mgf) files. In addition, details for all protein and/or peptide identifications (including PTMs), and the peptide-to-protein mappings can be output as tables in tab-delimited format. Finally, spectra and chromatograms (including annotations) can be saved as images in various formats.

PRIDE Inspector is fully supported and maintained by the PRIDE team. Moreover, it provides extra application programming interfaces (APIs) and libraries, which can be reused independently by the scientific community: the PRIDE XML JAXB (Java Architecture for XML Binding) library (for rapid and memory-efficient reading of PRIDE XML files) and the PRIDE mzGraph Browser library (for the visualization and annotation of spectra and chromatograms). These libraries are described in the Supplementary Notes. In addition, new features can be easily added to PRIDE Inspector thanks to its modular software architecture and permissive open-source licensing. Currently ongoing extensions include full support of the version 1.1 of the mzIdentML community standard for peptide and protein identifications⁹ because this format has only just reached stability (v1.1 was released on September 2011). Once mzIdentML is fully supported, it will also be possible to check thoroughly the issues related to protein inference¹⁰. This means that researchers need to be aware of this limitation when interpreting protein identifications reported by ambiguous (or shared) peptides. The PRIDE XML format is limited for that aim in the sense that only one of the possible peptide-protein mappings is usually reported.

PRIDE Inspector thus provides a user-friendly, comprehensive tool for the browsing, inspection and evaluation of data in the PRIDE database, or in a compatible standard file format. As such, we believe that PRIDE Inspector will substantially increase the ability of researchers, editors and peer reviewers to explore, review, evaluate and reuse proteomics data.

Author contributions

R.W. did most of the programming of the core components and the GUI. A.F. was mainly responsible for the chart component. D.R. was the main developer behind the access component of the PRIDE MySQL instance. D.O., J.M.F., R.G.C., J.G., A.C., Y.P.-R. and F.R. contributed to multiple areas during the development of the tool and also participated in the writing of the documentation and testing process. L.M. had the original idea and started the project. H.H. and J.A.V. supervised the whole process. J.A.V. and L.M. wrote the manuscript. All authors have agreed to all the content in the manuscript, including the data as presented.

References

Anonymous. Nat. Biotechnol. 27, 579 (2009).
Vizcaino, J.A. et al. Nucleic Acids Res. 38, D736–D742 (2010).
Article CAS Google Scholar
Deutsch, E.W., Lam, H. & Aebersold, R. EMBO Rep. 9, 429–434 (2008).
Article CAS Google Scholar
Hill, J.A., Smith, B.E., Papoulias, P.G. & Andrews, P.C. J. Proteome Res. 9, 2809–2811 (2010).
Article CAS Google Scholar
Barsnes, H., Vizcaino, J.A., Eidhammer, I. & Martens, L. Nat. Biotechnol. 27, 598–599 (2009).
Article CAS Google Scholar
Searle, B.C. Proteomics 10, 1265–1269 (2010).
Article CAS Google Scholar
Medina-Aunon, J.A., Carazo, J.M. & Albar, J.P. Proteomics 11, 334–337 (2011).
Article CAS Google Scholar
Martens, L. et al. Mol. Cell Proteomics 10, R110000133 (2011).
Article Google Scholar
Eisenacher, M. Methods Mol. Biol. 696, 161–177 (2011).
Article CAS Google Scholar
Nesvizhskii, A.I. & Aebersold, R. Mol. Cell. Proteomics 4, 1419–1440 (2005).
Article CAS Google Scholar

Download references

Acknowledgements

This work was supported by the Wellcome Trust (grant number WT085949MA) and EMBL core funding. R.G.C. is supported by EU FP7 grant SLING (grant number 226073). J.A.V. is supported by the EU FP7 grants LipidomicNet (grant number 202272) and ProteomeXchange (grant number 260558). A.F. was partially supported by the Spanish network COMBIOMED (RD07/0067/0006, ISCIII-FIS). L.M. would like to acknowledge support from the EU FP7 PRIME-XS grant (grant number 262067).

Author information

Authors and Affiliations

EMBL Outstation, European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
Rui Wang, Antonio Fabregat, Daniel Ríos, David Ovelleiro, Joseph M Foster, Richard G Côté, Johannes Griss, Attila Csordas, Yasset Perez-Riverol, Florian Reisinger, Henning Hermjakob & Juan Antonio Vizcaíno
Department of Medicine I, Medical University of Vienna, Vienna, Austria
Johannes Griss
Department of Proteomics, Center for Genetic Engineering and Biotechnology, Cubanacán, Playa, Ciudad de la Habana, Cuba
Yasset Perez-Riverol
Department of Medical Protein Research, Ghent, Belgium
Lennart Martens
Department of Biochemistry, Ghent University, Ghent, Belgium
Lennart Martens

Authors

Rui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Fabregat
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Ríos
View author publications
You can also search for this author in PubMed Google Scholar
David Ovelleiro
View author publications
You can also search for this author in PubMed Google Scholar
Joseph M Foster
View author publications
You can also search for this author in PubMed Google Scholar
Richard G Côté
View author publications
You can also search for this author in PubMed Google Scholar
Johannes Griss
View author publications
You can also search for this author in PubMed Google Scholar
Attila Csordas
View author publications
You can also search for this author in PubMed Google Scholar
Yasset Perez-Riverol
View author publications
You can also search for this author in PubMed Google Scholar
Florian Reisinger
View author publications
You can also search for this author in PubMed Google Scholar
Henning Hermjakob
View author publications
You can also search for this author in PubMed Google Scholar
Lennart Martens
View author publications
You can also search for this author in PubMed Google Scholar
Juan Antonio Vizcaíno
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Juan Antonio Vizcaíno.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–24 and Supplementary Notes (PDF 5343 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, R., Fabregat, A., Ríos, D. et al. PRIDE Inspector: a tool to visualize and validate MS proteomics data. Nat Biotechnol 30, 135–137 (2012). https://doi.org/10.1038/nbt.2112

Download citation

Published: 08 February 2012
Issue Date: February 2012
DOI: https://doi.org/10.1038/nbt.2112

This article is cited by

Novel interconnections of HOG signaling revealed by combined use of two proteomic software packages
- Marion Janschitz
- Natalie Romanov
- Wolfgang Reiter
Cell Communication and Signaling (2019)
Detection of candidate biomarkers of prostate cancer progression in serum: a depletion-free 3D LC/MS quantitative proteomics pilot study
- S E T Larkin
- H E Johnston
- P A Townsend
British Journal of Cancer (2016)
OpenMS: a flexible open-source software platform for mass spectrometry data analysis
- Hannes L Röst
- Timo Sachsenberg
- Oliver Kohlbacher
Nature Methods (2016)
Integrin endosomal signalling suppresses anoikis
- Jonna Alanko
- Anja Mai
- Johanna Ivaska
Nature Cell Biology (2015)
PeptideShaker enables reanalysis of MS-derived proteomics data sets
- Marc Vaudel
- Julia M Burkhart
- Harald Barsnes
Nature Biotechnology (2015)

Subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Supplementary information

Supplementary Text and Figures

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Novel interconnections of HOG signaling revealed by combined use of two proteomic software packages

Detection of candidate biomarkers of prostate cancer progression in serum: a depletion-free 3D LC/MS quantitative proteomics pilot study

OpenMS: a flexible open-source software platform for mass spectrometry data analysis

Integrin endosomal signalling suppresses anoikis

PeptideShaker enables reanalysis of MS-derived proteomics data sets

Search

Quick links