Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Protocol
  • Published:

A general pipeline for quality and statistical assessment of protein interaction data using R and Bioconductor

Abstract

The systematic mapping of protein interactions by bait–prey techniques, including affinity purification-mass spectrometry or the yeast two-hybrid system, contributes a unique and relevant perspective on the comprehensive picture of cellular machines. We describe here a protocol for statistical analysis of node-and-edge graph representations of these data using R and Bioconductor, recognizing that steps may be added or omitted depending on the data set at hand. The fundamental purpose of such analyses is feature estimation, defined here as the estimation of data-type-specific biological features, such as protein complex composition and the physical interaction integrity of known or estimated complexes. In preparation for feature estimation tasks, we outline a progression through three analytic components common to all bait–prey data types: preliminary setup, exploratory analysis and quality assessment. The end result is a collection of descriptive and inferred characteristics of the data, ready for biological interpretation in a computationally tractable form.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Seven ordered pairs representing protein interaction (bait, prey) pairs.
Figure 2: Flowchart describing the outline of analyses presented in this protocol.
Figure 3: Viable protein levels encapsulated in bar charts for the AP-MS experiments.
Figure 4: A screenshot of the html page that provides a user-friendly interface for the output of the hypergeometric test for functional category overrepresentation.
Figure 5: Scatter plots corresponding to the in- and out-degree for each VBP of Gavin06.
Figure 6: Estimate of PTP and PFP for Gavin06 AP-MS data using the GS ScISIC reference data set and a manifold of possible solutions.
Figure 7: This graph renders all the proteins found to interact with the preselected bait protein LSM5 (YER146W) within the Gavin06 data set.
Figure 8: Two examples of complex estimates using the apComplex algorithm.
Figure 9: An apComplex estimate with overlaid Y2H data from Ito et al.3 and Uetz et al.1 experiments.

Similar content being viewed by others

References

  1. Uetz, P. et al. A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae . Nature 403, 623–627 (2000).

    Article  CAS  Google Scholar 

  2. Newman, J.R., Wolf, E. & Kim, P.S. A computationally directed screen identifying interacting coiled coils from Saccharomyces cerevisiae . Proc. Natl. Acad. Sci. USA 97, 13203–13208 (2000).

    Article  CAS  Google Scholar 

  3. Ito, T. et al. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc. Natl. Acad. Sci. USA 98, 4569–4574 (2001).

    Article  CAS  Google Scholar 

  4. Tong, A.H.-Y. et al. A combined experimental and computational strategy to define protein interaction networks for peptide recognition modules. Science 295, 321–324 (2002).

    Article  CAS  Google Scholar 

  5. Risseeuw, E.P. et al. Protein interaction analysis of SCF ubiquitin E3 ligase subunits from Arabidopsis. Plant J. 34, 753–767 (2003).

    Article  CAS  Google Scholar 

  6. Hazbun, T. et al. Assigning function to yeast proteins by integration of technologies. Mol. Cell 6, 1353–1365 (2003).

    Article  Google Scholar 

  7. Millson, S.H. et al. A two-hybrid screen of the yeast proteome for Hsp90 interactors uncovers a novel Hsp90 chaperone requirement in the activity of a stress-activated mitogen-activated protein kinase Slt2p (Mpk1p). Eukaryot. Cell 4, 849–860 (2005).

    Article  CAS  Google Scholar 

  8. Ho, Y. et al. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415, 180–183 (2002).

    Article  CAS  Google Scholar 

  9. Gavin, A.-C. et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415, 141–147 (2002).

    Article  CAS  Google Scholar 

  10. Gavin, A.-C. et al. Proteome survey reveals modularity of the yeast cell machinery. Nature 440, 631–636 (2006).

    Article  CAS  Google Scholar 

  11. Krogan, N.J. et al. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae . Nature 440, 637–643 (2006).

    Article  CAS  Google Scholar 

  12. Ohi, M.D. et al. Proteomics analysis reveals stable multiprotein complexes in both fission and budding yeasts containing Myb-related Cdc5p/Cef1p, novel pre-mRNA splicing factors, and snRNAs. Mol. Cell. Biol. 7, 2011–2024 (2002).

    Article  Google Scholar 

  13. Grandi, P. et al. 90S pre-ribosomes include the 35S pre-rRNA, the U3 snoRNP, and 40S subunit processing factors but predominantly lack 60S synthesis factors. Mol. Cell 10, 105–115 (2002).

    Article  CAS  Google Scholar 

  14. Zhao, R. et al. Navigating the chaperone network: an integrative map of physical and genetic interactions mediated by the Hsp90 chaperone. Cell 120, 715–727 (2005).

    Article  CAS  Google Scholar 

  15. Graumann, J. et al. Applicability of tandem affinity purification mudpit to pathway proteomics in yeast. Mol. Cell. Proteomics 3, 226–237 (2004).

    Article  CAS  Google Scholar 

  16. Chiang, T., Scholtens, D., Sarkar, D., Gentleman, R. & Huber, W. Coverage and error models of protein–protein interaction data by directed graph analysis. Genome Biol. 8, R186 (2007).

    Article  Google Scholar 

  17. Scholtens, D., Chiang, T., Huber, W. & Gentleman, R. Estimating node degree in bait–prey graphs. Bioinformatics 24, 218–224 (2008).

    Article  CAS  Google Scholar 

  18. Zhang, B., Park, B.H., Karpinets, T. & Samatova, N.F. From pull-down data to protein interaction networks and complexes with biological relevance. Bioinformatics 24, 979–986 (2008).

    Article  CAS  Google Scholar 

  19. Scholtens, D., Vidal, M. & Gentleman, R. Local dynamic modeling of global interactome networks. Bioinformatics 21, 3548–3557 (2005).

    Article  CAS  Google Scholar 

  20. R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2008).

  21. Gentleman, R. et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5, R80 (2004).

    Article  Google Scholar 

  22. Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).

    Article  CAS  Google Scholar 

  23. Cline, M. et al. Integration of biological networks and gene expression data using Cytoscape. Nat. Protoc. 2, 2366–2382 (2007).

    Article  CAS  Google Scholar 

  24. Milenkovic, T., Lai, J. & Przulj, N. GraphCrunch: a tool for large network analyses. BMC Bioinformatics 9, 70 (2008).

    Article  Google Scholar 

  25. Royer, L., Reimann, M., Andreopoulous, B. & Schroeder, M. Unraveling protein networks with power graph analysis. PLoS Comput. Biol. 4, e1000108 (2008).

    Article  Google Scholar 

  26. Tuttle, W.T. Graph Theory. Cambridge Mathematical Library, New York, (2001).

    Google Scholar 

  27. Stanley, R.P. Enumerative Combinatorics I. Cambridge University Press, Cambridge, (1997).

    Book  Google Scholar 

  28. Carey, V.J., Gentry, J., Whalen, E. & Gentleman, R. Network structures and algorithms in bioconductor. Bioinformatics 21, 135–136 (2005).

    Article  CAS  Google Scholar 

  29. Huber, W., Carey, V.J., Long, L., Falcon, S. & Gentleman, R. Graphs in molecular biology. BMC Bioinformatics 8 (Suppl. 6): S8 (2007).

    Article  Google Scholar 

  30. Chiang, T., et al.Rintact: enabling computational analysis of molecular interaction data from the IntAct repository. Bioinformatics 24, 1100–1101 (2008).

    Article  CAS  Google Scholar 

  31. The Gene Ontology Consortium. Gene ontology: tool for the unification of biology. Nat. Genet. 25, 25–29 (2000).

  32. Kanehisa, M. & Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27–30 (2000).

    Article  CAS  Google Scholar 

  33. Kerrien, S. et al. Broadening the horizon—level 2.5 of the HUPO-PSI format for molecular interactions. BMC Biol. 5, 44 (2007).

    Article  Google Scholar 

  34. Stark, C. et al. BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 34, D535–D539 (2006).

    Article  CAS  Google Scholar 

  35. Salwinski, L. et al. The database of interacting proteins. Nucleic Acids Res. 32, D449–D451 (2004).

    Article  CAS  Google Scholar 

  36. Mishra, G. et al. Human protein reference database—2006 update. Nucleic Acids Res. 34, D411–D414 (2006).

    Article  CAS  Google Scholar 

  37. Kerrien, S. et al. IntAct—open source resource for molecular interaction data. Nucleic Acids Res. 35, D561–D565 (2007).

    Article  CAS  Google Scholar 

  38. Chatr-aryamontri, A. et al. MINT: the Molecular INTeraction database. Nucleic Acids Res. 35, D572–D574 (2007).

    Article  CAS  Google Scholar 

  39. Orchard, S. et al. The minimum information required for reporting a molecular interaction experiment (MIMIx). Nat. Biotechnol. 25, 894–898 (2007).

    Article  CAS  Google Scholar 

  40. Falcon, S. & Gentleman, R. Using GOstats to test gene lists for GO term association. Bioinformatics 2, 257–258 (2006).

    Google Scholar 

  41. Stevens, S. & Abelson, J. Purification of the yeast U4/U6.U5 small nuclear ribonucleoprotein particle and identification of its proteins. Proc. Natl. Acad. Sci. USA 96, 7226–7231 (1999).

    Article  CAS  Google Scholar 

  42. Chowdhury, A., Mukhopadhyay, J. & Tharun, S. The decapping activator Lsm1p-7p-Pat1p complex has the intrinsic ability to distinguish between oligoadenylated and polyadenylated RNAs. RNA 13, 998–1016 (2007).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We thank the reviewers for their insightful and useful critique of this manuscript. We also thank Wolfgang Huber, Robert Gentleman, Simon Anders and especially Sandra Orchard for their comments and suggestions as this manuscript evolved. T.C. has been supported by the Human Frontiers Science Program Grant (HFSP/LIP0442/2005) as well as the Ferris Fund from King's College, Cambridge University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Denise Scholtens.

Supplementary information

41596_2009_BFnprot200926_MOESM427_ESM.pdf

Supplementary Fig. 1: directedGraph.pdf is a figure file cited in the vignette-style document ppiProtocol.Rnw. (PDF 181 kb)

Supplementary Fig. 2: flowChart.pdf is a figure file cited in the vignette-style document ppiProtocol.Rnw. (PDF 197 kb)

41596_2009_BFnprot200926_MOESM429_ESM.jpg

Supplementary Fig. 3: Gavin2006Screenshot.png is a figure file cited in the vignette-style document ppiProtocol.Rnw. (JPG 163 kb)

41596_2009_BFnprot200926_MOESM430_ESM.zip

Supplementary Method 1: ppiProtocol.Rnw is a vignette-style document file weaving together the text and code contained in this protocol so that users can reproduce the analysis results. (ZIP 18 kb)

41596_2009_BFnprot200926_MOESM431_ESM.zip

Supplementary Method 2: ppiProtocol.R contains the R code used in the analyses presented in this protocol without the accompanying text. (ZIP 3 kb)

41596_2009_BFnprot200926_MOESM432_ESM.zip

Supplementary Method 3: ppiProtocol.bib is a Bibtex file containing the references cited in this document and used in ppiProtocol.Rnw. (ZIP 4 kb)

41596_2009_BFnprot200926_MOESM433_ESM.zip

Supplementary Data 1: ppiProtocolData_1.1.0.tar.gz is an R data package containing all data sets used in this protocol. (ZIP 12073 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chiang, T., Scholtens, D. A general pipeline for quality and statistical assessment of protein interaction data using R and Bioconductor. Nat Protoc 4, 535–546 (2009). https://doi.org/10.1038/nprot.2009.26

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nprot.2009.26

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing