Abstract
The systematic mapping of protein interactions by bait–prey techniques, including affinity purification-mass spectrometry or the yeast two-hybrid system, contributes a unique and relevant perspective on the comprehensive picture of cellular machines. We describe here a protocol for statistical analysis of node-and-edge graph representations of these data using R and Bioconductor, recognizing that steps may be added or omitted depending on the data set at hand. The fundamental purpose of such analyses is feature estimation, defined here as the estimation of data-type-specific biological features, such as protein complex composition and the physical interaction integrity of known or estimated complexes. In preparation for feature estimation tasks, we outline a progression through three analytic components common to all bait–prey data types: preliminary setup, exploratory analysis and quality assessment. The end result is a collection of descriptive and inferred characteristics of the data, ready for biological interpretation in a computationally tractable form.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Uetz, P. et al. A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae . Nature 403, 623–627 (2000).
Newman, J.R., Wolf, E. & Kim, P.S. A computationally directed screen identifying interacting coiled coils from Saccharomyces cerevisiae . Proc. Natl. Acad. Sci. USA 97, 13203–13208 (2000).
Ito, T. et al. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc. Natl. Acad. Sci. USA 98, 4569–4574 (2001).
Tong, A.H.-Y. et al. A combined experimental and computational strategy to define protein interaction networks for peptide recognition modules. Science 295, 321–324 (2002).
Risseeuw, E.P. et al. Protein interaction analysis of SCF ubiquitin E3 ligase subunits from Arabidopsis. Plant J. 34, 753–767 (2003).
Hazbun, T. et al. Assigning function to yeast proteins by integration of technologies. Mol. Cell 6, 1353–1365 (2003).
Millson, S.H. et al. A two-hybrid screen of the yeast proteome for Hsp90 interactors uncovers a novel Hsp90 chaperone requirement in the activity of a stress-activated mitogen-activated protein kinase Slt2p (Mpk1p). Eukaryot. Cell 4, 849–860 (2005).
Ho, Y. et al. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415, 180–183 (2002).
Gavin, A.-C. et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415, 141–147 (2002).
Gavin, A.-C. et al. Proteome survey reveals modularity of the yeast cell machinery. Nature 440, 631–636 (2006).
Krogan, N.J. et al. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae . Nature 440, 637–643 (2006).
Ohi, M.D. et al. Proteomics analysis reveals stable multiprotein complexes in both fission and budding yeasts containing Myb-related Cdc5p/Cef1p, novel pre-mRNA splicing factors, and snRNAs. Mol. Cell. Biol. 7, 2011–2024 (2002).
Grandi, P. et al. 90S pre-ribosomes include the 35S pre-rRNA, the U3 snoRNP, and 40S subunit processing factors but predominantly lack 60S synthesis factors. Mol. Cell 10, 105–115 (2002).
Zhao, R. et al. Navigating the chaperone network: an integrative map of physical and genetic interactions mediated by the Hsp90 chaperone. Cell 120, 715–727 (2005).
Graumann, J. et al. Applicability of tandem affinity purification mudpit to pathway proteomics in yeast. Mol. Cell. Proteomics 3, 226–237 (2004).
Chiang, T., Scholtens, D., Sarkar, D., Gentleman, R. & Huber, W. Coverage and error models of protein–protein interaction data by directed graph analysis. Genome Biol. 8, R186 (2007).
Scholtens, D., Chiang, T., Huber, W. & Gentleman, R. Estimating node degree in bait–prey graphs. Bioinformatics 24, 218–224 (2008).
Zhang, B., Park, B.H., Karpinets, T. & Samatova, N.F. From pull-down data to protein interaction networks and complexes with biological relevance. Bioinformatics 24, 979–986 (2008).
Scholtens, D., Vidal, M. & Gentleman, R. Local dynamic modeling of global interactome networks. Bioinformatics 21, 3548–3557 (2005).
R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2008).
Gentleman, R. et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5, R80 (2004).
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).
Cline, M. et al. Integration of biological networks and gene expression data using Cytoscape. Nat. Protoc. 2, 2366–2382 (2007).
Milenkovic, T., Lai, J. & Przulj, N. GraphCrunch: a tool for large network analyses. BMC Bioinformatics 9, 70 (2008).
Royer, L., Reimann, M., Andreopoulous, B. & Schroeder, M. Unraveling protein networks with power graph analysis. PLoS Comput. Biol. 4, e1000108 (2008).
Tuttle, W.T. Graph Theory. Cambridge Mathematical Library, New York, (2001).
Stanley, R.P. Enumerative Combinatorics I. Cambridge University Press, Cambridge, (1997).
Carey, V.J., Gentry, J., Whalen, E. & Gentleman, R. Network structures and algorithms in bioconductor. Bioinformatics 21, 135–136 (2005).
Huber, W., Carey, V.J., Long, L., Falcon, S. & Gentleman, R. Graphs in molecular biology. BMC Bioinformatics 8 (Suppl. 6): S8 (2007).
Chiang, T., et al.Rintact: enabling computational analysis of molecular interaction data from the IntAct repository. Bioinformatics 24, 1100–1101 (2008).
The Gene Ontology Consortium. Gene ontology: tool for the unification of biology. Nat. Genet. 25, 25–29 (2000).
Kanehisa, M. & Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27–30 (2000).
Kerrien, S. et al. Broadening the horizon—level 2.5 of the HUPO-PSI format for molecular interactions. BMC Biol. 5, 44 (2007).
Stark, C. et al. BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 34, D535–D539 (2006).
Salwinski, L. et al. The database of interacting proteins. Nucleic Acids Res. 32, D449–D451 (2004).
Mishra, G. et al. Human protein reference database—2006 update. Nucleic Acids Res. 34, D411–D414 (2006).
Kerrien, S. et al. IntAct—open source resource for molecular interaction data. Nucleic Acids Res. 35, D561–D565 (2007).
Chatr-aryamontri, A. et al. MINT: the Molecular INTeraction database. Nucleic Acids Res. 35, D572–D574 (2007).
Orchard, S. et al. The minimum information required for reporting a molecular interaction experiment (MIMIx). Nat. Biotechnol. 25, 894–898 (2007).
Falcon, S. & Gentleman, R. Using GOstats to test gene lists for GO term association. Bioinformatics 2, 257–258 (2006).
Stevens, S. & Abelson, J. Purification of the yeast U4/U6.U5 small nuclear ribonucleoprotein particle and identification of its proteins. Proc. Natl. Acad. Sci. USA 96, 7226–7231 (1999).
Chowdhury, A., Mukhopadhyay, J. & Tharun, S. The decapping activator Lsm1p-7p-Pat1p complex has the intrinsic ability to distinguish between oligoadenylated and polyadenylated RNAs. RNA 13, 998–1016 (2007).
Acknowledgements
We thank the reviewers for their insightful and useful critique of this manuscript. We also thank Wolfgang Huber, Robert Gentleman, Simon Anders and especially Sandra Orchard for their comments and suggestions as this manuscript evolved. T.C. has been supported by the Human Frontiers Science Program Grant (HFSP/LIP0442/2005) as well as the Ferris Fund from King's College, Cambridge University.
Author information
Authors and Affiliations
Corresponding author
Supplementary information
41596_2009_BFnprot200926_MOESM427_ESM.pdf
Supplementary Fig. 1: directedGraph.pdf is a figure file cited in the vignette-style document ppiProtocol.Rnw. (PDF 181 kb)
41596_2009_BFnprot200926_MOESM429_ESM.jpg
Supplementary Fig. 3: Gavin2006Screenshot.png is a figure file cited in the vignette-style document ppiProtocol.Rnw. (JPG 163 kb)
41596_2009_BFnprot200926_MOESM430_ESM.zip
Supplementary Method 1: ppiProtocol.Rnw is a vignette-style document file weaving together the text and code contained in this protocol so that users can reproduce the analysis results. (ZIP 18 kb)
41596_2009_BFnprot200926_MOESM431_ESM.zip
Supplementary Method 2: ppiProtocol.R contains the R code used in the analyses presented in this protocol without the accompanying text. (ZIP 3 kb)
41596_2009_BFnprot200926_MOESM432_ESM.zip
Supplementary Method 3: ppiProtocol.bib is a Bibtex file containing the references cited in this document and used in ppiProtocol.Rnw. (ZIP 4 kb)
41596_2009_BFnprot200926_MOESM433_ESM.zip
Supplementary Data 1: ppiProtocolData_1.1.0.tar.gz is an R data package containing all data sets used in this protocol. (ZIP 12073 kb)
Rights and permissions
About this article
Cite this article
Chiang, T., Scholtens, D. A general pipeline for quality and statistical assessment of protein interaction data using R and Bioconductor. Nat Protoc 4, 535–546 (2009). https://doi.org/10.1038/nprot.2009.26
Published:
Issue Date:
DOI: https://doi.org/10.1038/nprot.2009.26
This article is cited by
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.