A general pipeline for quality and statistical assessment of protein interaction data using R and Bioconductor

Chiang, Tony; Scholtens, Denise

doi:10.1038/nprot.2009.26

Protocol
Published: 26 March 2009

A general pipeline for quality and statistical assessment of protein interaction data using R and Bioconductor

Tony Chiang^1,2 &
Denise Scholtens³

Nature Protocols volume 4, pages 535–546 (2009)Cite this article

880 Accesses
12 Citations
Metrics details

Abstract

The systematic mapping of protein interactions by bait–prey techniques, including affinity purification-mass spectrometry or the yeast two-hybrid system, contributes a unique and relevant perspective on the comprehensive picture of cellular machines. We describe here a protocol for statistical analysis of node-and-edge graph representations of these data using R and Bioconductor, recognizing that steps may be added or omitted depending on the data set at hand. The fundamental purpose of such analyses is feature estimation, defined here as the estimation of data-type-specific biological features, such as protein complex composition and the physical interaction integrity of known or estimated complexes. In preparation for feature estimation tasks, we outline a progression through three analytic components common to all bait–prey data types: preliminary setup, exploratory analysis and quality assessment. The end result is a collection of descriptive and inferred characteristics of the data, ready for biological interpretation in a computationally tractable form.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Seven ordered pairs representing protein interaction (bait, prey) pairs.**

**Figure 2: Flowchart describing the outline of analyses presented in this protocol.**

**Figure 3: Viable protein levels encapsulated in bar charts for the AP-MS experiments.**

**Figure 4: A screenshot of the html page that provides a user-friendly interface for the output of the hypergeometric test for functional category overrepresentation.**

**Figure 5: Scatter plots corresponding to the in- and out-degree for each VBP of Gavin06.**

**Figure 6: Estimate of P_TP and P_FP for Gavin06 AP-MS data using the GS ScISIC reference data set and a manifold of possible solutions.**

**Figure 7: This graph renders all the proteins found to interact with the preselected bait protein LSM5 (YER146W) within the Gavin06 data set.**

**Figure 8: Two examples of complex estimates using the *apComplex* algorithm.**

**Figure 9: An *apComplex* estimate with overlaid Y2H data from Ito *et al*.³ and Uetz *et al*.¹ experiments.**

The social and structural architecture of the yeast protein interactome

Article Open access 15 November 2023

Mapping protein states and interactions across the tree of life with co-fractionation mass spectrometry

Article Open access 15 December 2023

Meta-analysis defines principles for the design and analysis of co-fractionation mass spectrometry experiments

Article 01 July 2021

References

Uetz, P. et al. A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae . Nature 403, 623–627 (2000).
Article CAS Google Scholar
Newman, J.R., Wolf, E. & Kim, P.S. A computationally directed screen identifying interacting coiled coils from Saccharomyces cerevisiae . Proc. Natl. Acad. Sci. USA 97, 13203–13208 (2000).
Article CAS Google Scholar
Ito, T. et al. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc. Natl. Acad. Sci. USA 98, 4569–4574 (2001).
Article CAS Google Scholar
Tong, A.H.-Y. et al. A combined experimental and computational strategy to define protein interaction networks for peptide recognition modules. Science 295, 321–324 (2002).
Article CAS Google Scholar
Risseeuw, E.P. et al. Protein interaction analysis of SCF ubiquitin E3 ligase subunits from Arabidopsis. Plant J. 34, 753–767 (2003).
Article CAS Google Scholar
Hazbun, T. et al. Assigning function to yeast proteins by integration of technologies. Mol. Cell 6, 1353–1365 (2003).
Article Google Scholar
Millson, S.H. et al. A two-hybrid screen of the yeast proteome for Hsp90 interactors uncovers a novel Hsp90 chaperone requirement in the activity of a stress-activated mitogen-activated protein kinase Slt2p (Mpk1p). Eukaryot. Cell 4, 849–860 (2005).
Article CAS Google Scholar
Ho, Y. et al. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415, 180–183 (2002).
Article CAS Google Scholar
Gavin, A.-C. et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415, 141–147 (2002).
Article CAS Google Scholar
Gavin, A.-C. et al. Proteome survey reveals modularity of the yeast cell machinery. Nature 440, 631–636 (2006).
Article CAS Google Scholar
Krogan, N.J. et al. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae . Nature 440, 637–643 (2006).
Article CAS Google Scholar
Ohi, M.D. et al. Proteomics analysis reveals stable multiprotein complexes in both fission and budding yeasts containing Myb-related Cdc5p/Cef1p, novel pre-mRNA splicing factors, and snRNAs. Mol. Cell. Biol. 7, 2011–2024 (2002).
Article Google Scholar
Grandi, P. et al. 90S pre-ribosomes include the 35S pre-rRNA, the U3 snoRNP, and 40S subunit processing factors but predominantly lack 60S synthesis factors. Mol. Cell 10, 105–115 (2002).
Article CAS Google Scholar
Zhao, R. et al. Navigating the chaperone network: an integrative map of physical and genetic interactions mediated by the Hsp90 chaperone. Cell 120, 715–727 (2005).
Article CAS Google Scholar
Graumann, J. et al. Applicability of tandem affinity purification mudpit to pathway proteomics in yeast. Mol. Cell. Proteomics 3, 226–237 (2004).
Article CAS Google Scholar
Chiang, T., Scholtens, D., Sarkar, D., Gentleman, R. & Huber, W. Coverage and error models of protein–protein interaction data by directed graph analysis. Genome Biol. 8, R186 (2007).
Article Google Scholar
Scholtens, D., Chiang, T., Huber, W. & Gentleman, R. Estimating node degree in bait–prey graphs. Bioinformatics 24, 218–224 (2008).
Article CAS Google Scholar
Zhang, B., Park, B.H., Karpinets, T. & Samatova, N.F. From pull-down data to protein interaction networks and complexes with biological relevance. Bioinformatics 24, 979–986 (2008).
Article CAS Google Scholar
Scholtens, D., Vidal, M. & Gentleman, R. Local dynamic modeling of global interactome networks. Bioinformatics 21, 3548–3557 (2005).
Article CAS Google Scholar
R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2008).
Gentleman, R. et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5, R80 (2004).
Article Google Scholar
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).
Article CAS Google Scholar
Cline, M. et al. Integration of biological networks and gene expression data using Cytoscape. Nat. Protoc. 2, 2366–2382 (2007).
Article CAS Google Scholar
Milenkovic, T., Lai, J. & Przulj, N. GraphCrunch: a tool for large network analyses. BMC Bioinformatics 9, 70 (2008).
Article Google Scholar
Royer, L., Reimann, M., Andreopoulous, B. & Schroeder, M. Unraveling protein networks with power graph analysis. PLoS Comput. Biol. 4, e1000108 (2008).
Article Google Scholar
Tuttle, W.T. Graph Theory. Cambridge Mathematical Library, New York, (2001).
Google Scholar
Stanley, R.P. Enumerative Combinatorics I. Cambridge University Press, Cambridge, (1997).
Book Google Scholar
Carey, V.J., Gentry, J., Whalen, E. & Gentleman, R. Network structures and algorithms in bioconductor. Bioinformatics 21, 135–136 (2005).
Article CAS Google Scholar
Huber, W., Carey, V.J., Long, L., Falcon, S. & Gentleman, R. Graphs in molecular biology. BMC Bioinformatics 8 (Suppl. 6): S8 (2007).
Article Google Scholar
Chiang, T., et al.Rintact: enabling computational analysis of molecular interaction data from the IntAct repository. Bioinformatics 24, 1100–1101 (2008).
Article CAS Google Scholar
The Gene Ontology Consortium. Gene ontology: tool for the unification of biology. Nat. Genet. 25, 25–29 (2000).
Kanehisa, M. & Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27–30 (2000).
Article CAS Google Scholar
Kerrien, S. et al. Broadening the horizon—level 2.5 of the HUPO-PSI format for molecular interactions. BMC Biol. 5, 44 (2007).
Article Google Scholar
Stark, C. et al. BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 34, D535–D539 (2006).
Article CAS Google Scholar
Salwinski, L. et al. The database of interacting proteins. Nucleic Acids Res. 32, D449–D451 (2004).
Article CAS Google Scholar
Mishra, G. et al. Human protein reference database—2006 update. Nucleic Acids Res. 34, D411–D414 (2006).
Article CAS Google Scholar
Kerrien, S. et al. IntAct—open source resource for molecular interaction data. Nucleic Acids Res. 35, D561–D565 (2007).
Article CAS Google Scholar
Chatr-aryamontri, A. et al. MINT: the Molecular INTeraction database. Nucleic Acids Res. 35, D572–D574 (2007).
Article CAS Google Scholar
Orchard, S. et al. The minimum information required for reporting a molecular interaction experiment (MIMIx). Nat. Biotechnol. 25, 894–898 (2007).
Article CAS Google Scholar
Falcon, S. & Gentleman, R. Using GOstats to test gene lists for GO term association. Bioinformatics 2, 257–258 (2006).
Google Scholar
Stevens, S. & Abelson, J. Purification of the yeast U4/U6.U5 small nuclear ribonucleoprotein particle and identification of its proteins. Proc. Natl. Acad. Sci. USA 96, 7226–7231 (1999).
Article CAS Google Scholar
Chowdhury, A., Mukhopadhyay, J. & Tharun, S. The decapping activator Lsm1p-7p-Pat1p complex has the intrinsic ability to distinguish between oligoadenylated and polyadenylated RNAs. RNA 13, 998–1016 (2007).
Article CAS Google Scholar

Download references

Acknowledgements

We thank the reviewers for their insightful and useful critique of this manuscript. We also thank Wolfgang Huber, Robert Gentleman, Simon Anders and especially Sandra Orchard for their comments and suggestions as this manuscript evolved. T.C. has been supported by the Human Frontiers Science Program Grant (HFSP/LIP0442/2005) as well as the Ferris Fund from King's College, Cambridge University.

Author information

Authors and Affiliations

EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
Tony Chiang
Department of Computational Biology, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
Tony Chiang
Department of Preventive Medicine, Northwestern University, Chicago, Illinois, USA
Denise Scholtens

Authors

Tony Chiang
View author publications
You can also search for this author in PubMed Google Scholar
Denise Scholtens
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Denise Scholtens.

Supplementary information

41596_2009_BFnprot200926_MOESM427_ESM.pdf

Supplementary Fig. 1: directedGraph.pdf is a figure file cited in the vignette-style document ppiProtocol.Rnw. (PDF 181 kb)

Supplementary Fig. 2: flowChart.pdf is a figure file cited in the vignette-style document ppiProtocol.Rnw. (PDF 197 kb)

41596_2009_BFnprot200926_MOESM429_ESM.jpg

Supplementary Fig. 3: Gavin2006Screenshot.png is a figure file cited in the vignette-style document ppiProtocol.Rnw. (JPG 163 kb)

41596_2009_BFnprot200926_MOESM430_ESM.zip

Supplementary Method 1: ppiProtocol.Rnw is a vignette-style document file weaving together the text and code contained in this protocol so that users can reproduce the analysis results. (ZIP 18 kb)

41596_2009_BFnprot200926_MOESM431_ESM.zip

Supplementary Method 2: ppiProtocol.R contains the R code used in the analyses presented in this protocol without the accompanying text. (ZIP 3 kb)

41596_2009_BFnprot200926_MOESM432_ESM.zip

Supplementary Method 3: ppiProtocol.bib is a Bibtex file containing the references cited in this document and used in ppiProtocol.Rnw. (ZIP 4 kb)

41596_2009_BFnprot200926_MOESM433_ESM.zip

Supplementary Data 1: ppiProtocolData_1.1.0.tar.gz is an R data package containing all data sets used in this protocol. (ZIP 12073 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chiang, T., Scholtens, D. A general pipeline for quality and statistical assessment of protein interaction data using R and Bioconductor. Nat Protoc 4, 535–546 (2009). https://doi.org/10.1038/nprot.2009.26

Download citation

Published: 26 March 2009
Issue Date: April 2009
DOI: https://doi.org/10.1038/nprot.2009.26

This article is cited by

Rapid immunoprecipitation mass spectrometry of endogenous proteins (RIME) for analysis of chromatin complexes
- Hisham Mohammed
- Christopher Taylor
- Clive S D'Santos
Nature Protocols (2016)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

A general pipeline for quality and statistical assessment of protein interaction data using R and Bioconductor

Abstract

Access options

Similar content being viewed by others

The social and structural architecture of the yeast protein interactome

Mapping protein states and interactions across the tree of life with co-fractionation mass spectrometry

Meta-analysis defines principles for the design and analysis of co-fractionation mass spectrometry experiments

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Supplementary information

41596_2009_BFnprot200926_MOESM427_ESM.pdf

Supplementary Fig. 2: flowChart.pdf is a figure file cited in the vignette-style document ppiProtocol.Rnw. (PDF 197 kb)

41596_2009_BFnprot200926_MOESM429_ESM.jpg

41596_2009_BFnprot200926_MOESM430_ESM.zip

41596_2009_BFnprot200926_MOESM431_ESM.zip

41596_2009_BFnprot200926_MOESM432_ESM.zip

41596_2009_BFnprot200926_MOESM433_ESM.zip

Rights and permissions

About this article

Cite this article

This article is cited by

Rapid immunoprecipitation mass spectrometry of endogenous proteins (RIME) for analysis of chromatin complexes

Comments

Search

Quick links

Abstract

Access options

Similar content being viewed by others

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links