Review

Nature Reviews Molecular Cell Biology 8, 645-654 (August 2007) | doi:10.1038/nrm2208

Analysis of protein complexes using mass spectrometry

Anne-Claude Gingras1, Matthias Gstaiger2, Brian Raught3 & Ruedi Aebersold2,4  About the authors

Top

The versatile combination of affinity purification and mass spectrometry (AP–MS) has recently been applied to the detailed characterization of many protein complexes and large protein-interaction networks. The combination of AP–MS with other techniques, such as biochemical fractionation, intact mass measurement and chemical crosslinking, can help to decipher the supramolecular organization of protein complexes. AP–MS can also be combined with quantitative proteomics approaches to better understand the dynamics of protein–complex assembly.

The past decade has seen the complete sequencing of several eukaryotic genomes, providing a comprehensive inventory of predicted proteins for many different species. However, such protein lists are not sufficient to describe biological processes. Vital cellular functions such as DNA replication, transcription and mRNA translation require the coordinated action of a large number of proteins that are assembled into an array of multiprotein complexes of distinct composition and structure. Similarly, biological processes are orchestrated and regulated by dynamic signalling networks of interacting proteins that link chemical or physical stimuli to specific effector molecules. The analysis of protein complexes and protein–protein interaction networks — and the dynamic behaviour of these networks as a function of time and cell state — are therefore of central importance in biological research. Various large-scale efforts have thus attempted to define protein interactomes in several organisms, including Saccharomyces cerevisiae1, 2, 3, 4, 5, 6, Drosophila melanogaster7, 8, 9, Caenorhabditis elegans10 and Homo sapiens11, 12, 13.

Different approaches have been used to characterize protein complexes and protein–protein interaction networks. The first interactome maps were obtained using the yeast two-hybrid system (the yeast two-hybrid approach and its strengths and weaknesses are described elsewhere14, 15, 16). More recently, a combination of affinity purification and mass spectrometry (AP–MS) has been used to greatly advance our understanding of protein-complex composition. With the AP–MS method, multiprotein complexes are isolated directly from cell lysates through one or more AP steps. Complex components are then identified by MS (Fig. 1). In contrast to yeast two-hybrid and related methods, AP–MS can be performed under near physiological conditions, in the relevant organism and cell type. AP–MS does not typically perturb relevant post-translational modifications, which are often crucial for the organization and/or activity of complexes. These post-translational modifications can also be identified by MS. Another advantage of AP–MS is that it can be used to probe dynamic changes in the composition of protein complexes, especially when used in combination with quantitative proteomics techniques17, 18, 19, 20, 21, 22.

Figure 1 | General overview of an affinity purification and mass spectrometry experiment.
Figure 1 : General overview of an affinity purification and mass spectrometry experiment. Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, or to obtain a text description, please contact npg@nature.com

a | The protein of interest (often epitope tagged; blue) is purified from a cell lysate together with its binding partners (orange and green). Contaminants (red) can also be present. b | In an optional step, proteins in the complex can be separated by SDS–PAGE (followed by silver or coomassie staining) or by some type of liquid chromatography. Although analysis of gel-purified proteins has been used most often so far, gel-free approaches allow for a more rapid and generic analysis and are increasingly used. c | Proteins are subjected to proteolysis (usually with trypsin). d | Mass spectrometry (MS) analysis of peptides. In most cases, this involves peptide separation by reversed-phase liquid chromatography followed by two MS events: in the first scan, the mass/charge ratio (m/z) of the intact peptide is measured. The most abundant peptides are then specifically selected and subjected to fragmentation, yielding a tandem MS (MS/MS) spectrum (a simplified MS/MS scan is shown for one of the peptides). e | Database searching and statistical software are used to interpret the MS data to yield a list of proteins that were present in the initial sample, including the tagged protein, its interacting partners and contaminants.


Two recent reports2, 4 describe high-confidence AP–MS data that connect an estimated 60% of the yeast proteome, and these studies demonstrate that large-scale protein-interaction mapping by MS is feasible. So, it is timely to reflect on AP–MS approaches and to outline some emerging strategies that are designed to better exploit this powerful technique. In this review, we briefly summarize experimental design for AP–MS, and we discuss how AP–MS data can be used to characterize the components of protein complexes. We then describe how AP–MS can be combined with other strategies (such as biochemical fractionation, co-elution profiles and chemical crosslinking) to reveal protein-complex stoichiometry and structural organization. Last, we discuss the use of quantitative proteomics to study the dynamics of protein-complex composition.

AP–MS: the technique

Several recent advances have enabled researchers to successfully apply AP–MS techniques to high-throughput acquisition of interaction data, especially in S. cerevisiae, in which epitope-tagged proteins (or baits) can be routinely generated for many open reading frames (ORFs). These advances also include the isolation of protein complexes using generic protocols and powerful MS methods for the identification and quantification of isolated proteins.

The first step — affinity purification. Generic approaches that use affinity-tagged recombinant proteins have allowed for parallel sample preparation without the need to optimize the purification protocol for each protein complex. Proteins of interest are simply expressed in-frame with an epitope tag (at either the N or C terminus), which is then used as an affinity handle to purify the tagged protein (the bait) along with its interacting partners (the prey). Although several different tags or tag combinations have been successfully used in many low-throughput studies (see Ref. 23), high-throughput studies have primarily used either the flag tag3, 23 or the tandem affinity purification (TAP) tag2, 4, 5 system (Box 1).

In the flag-tag approach, as used by Ho et al., C-terminally flag-tagged proteins were expressed under the control of a GAL-inducible promoter and isolated in a single step using an anti-flag antibody resin3. Tagging 10% of the yeast ORFs, the authors were able to connect 25% of the yeast proteome. In the TAP-tag approach, as used by Gavin et al.4, 5 and Krogan et al.2, yeast genes for the proteins of interest were fused to a C-terminal dual-epitope tag via homologous recombination, such that the proteins were expressed under their own promoters. Protein purification was carried out in two steps, first via the protein A moiety in the TAP tag (which binds immunoglobulin G (IgG)–sepharose), and then via the calmodulin-binding peptide (which exhibits high affinity to calmodulin–sepharose; Box 1). Further discussion on false positives and false negatives in AP–MS experiments can be found in Box 2.

The second step — mass spectrometry. Two main strategies to ionize peptide ions, electrospray ionization (ESI) and matrix-assisted laser desorption/ionization (MALDI), and their implementation on several types of tandem mass spectrometers24, have allowed for efficient sequencing of peptides derived from proteolytic digests of protein complexes (an in-depth discussion of MS is beyond the scope of this review; we refer the reader to excellent recent publications25, 26, 27, 28, 29).

MS is currently the method of choice for peptide sequencing because it is sensitive; it routinely allows for the identification of peptides that are present at femtomole levels. MS is also rapid; sequencing of individual peptides can be achieved within hundreds of milliseconds, and thousands of peptides can therefore be identified in a single MS run. Last, MS is compatible with high-throughput strategies and is easily automated. It also allows for the characterization of peptide modifications (including naturally occurring post-translational modifications, such as phosphorylation, and exogenously added modifications, such as chemical crosslinkers). MS can also be adapted to quantitatively measure peptide abundance and does not require pre-existing knowledge of the proteins to be analysed. Advances in sample processing and instrumentation have gone hand-in-hand with the development of software tools that automatically retrieve sequence information from acquired mass spectra and provide statistical validation of the accuracy of the determined sequences30, 31.

Although individual research groups have used different methods for the analysis of protein complexes by MS, the basic principles are essentially the same (Fig. 1).

From interactors to complexes

The AP–MS technique only generates a list of proteins detected in a given sample, and does not necessarily reveal the composition of individual protein complexes. The data from a single AP–MS experiment represents an average of binding partners and protein complexes. If the bait protein is a component of multiple alternative complexes, a single AP–MS analysis cannot be used to decipher this multiplicity of associations. This is an important limitation because proteins can have dramatically different roles as components of different types of complexes.

Identifying PP2A-interacting partners. The acquisition of high-density interaction data sets in which each component (or multiple components) of a particular complex is affinity tagged and purified can greatly assist in deciphering the association of a given protein with multiple alternative complexes. For example, TAP-tag AP–MS was used to identify the interacting partners for the catalytic (C) subunit of the human serine/threonine phosphatase PP2A (Fig. 2; A.-C.G., B.R. and R.A., unpublished observations). PP2A is an important serine/threonine phosphatase in eukaryotes and generally functions as a trimer in which the C subunit is associated with a regulatory protein (B, B' or B''; 14 genes in humans) through an adaptor (A) molecule32, 33. Mutually exclusive interaction of the C subunit with a protein known as alpha4 is also observed34, 35, 36. A single TAP-tag AP–MS of the PP2A C subunit revealed a large list of interactors without shedding light on individual protein–protein complexes. However, reciprocal TAP analyses of each of the proteins that were identified in the initial experiment recapitulated the known PP2A supramolecular architecture. Consistent with all previous biochemical data, TAP-alpha4 pulldowns contained only the C subunit of PP2A, without adaptor and regulatory proteins. On the other hand, the association of one regulatory subunit (for example, the B subunit) was mutually exclusive to that of another (B') (A.-C.G. and R.A., unpublished observations).

Figure 2 | High-density data acquisition to elucidate complex composition.
Figure 2 : High-density data acquisition to elucidate complex composition. Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, or to obtain a text description, please contact npg@nature.com

a | Three different biochemically defined complexes that contain the catalytic (C) subunit of the serine/threonine phosphatase PP2A (note that this is not an exhaustive list). b | Results from an affinity purification and mass spectrometry (AP–MS) analysis of the tandem AP (TAP)-tagged PP2A C subunit. These data represent an average of all PP2A-containing complexes; the composition of each of the different PP2A-containing complexes cannot be resolved. c | After tagging each component and repeating the purification procedure, the AP–MS data reveal several distinct PP2A-containing complexes, as depicted in a (indicated by the arrows).


Gavin et al. also generated a high-density map of the interactions of the yeast orthologues of PP2A and recapitulated the mutually exclusive association of the B (Cdc55) and B' (Rts1) PP2A subunits with the PP2A C (Pph21 and Pph22) and A (Tpd3) proteins5. We also recently applied this strategy to the organization of the PP2A-related phosphatase PP4 and demonstrated that the PP4 C subunit assembles into several different alternative complexes37. High-density data can thus be used to identify mutually exclusive and/or cooperative protein assemblies.

High-density data in high-throughput studies. The use of high-density data to characterize alternative protein complexes can also be extended to large interaction networks. Two recent S. cerevisiae AP–MS studies2, 4 have made serious and promising attempts to dissect protein complexes from moderately dense interaction data. Although the two reports differ in the methodologies that were used to extract data and identify complexes, both make use of internally generated data sets that comprise roughly 2,000 successful TAP-tag purifications.

Krogan et al.2 describe the identification of protein complexes on the basis of a graph-clustering algorithm that identifies highly connected modules in protein–protein interaction networks. The method was applied to the interaction data after the removal of common contaminants, and values for the clustering parameters were selected to optimize overlap with hand-curated protein complexes (available from Munich Information Center for Protein Sciences (MIPS)). This method identified 547 distinct heteromeric protein complexes, about half of which were novel2. However, although this method excels at identifying interacting partners, it does not necessarily separate complexes that contain shared subunits into different entities, but instead often groups them into a single larger complex. As such, it is difficult to directly compare the 'complexes' identified in this approach to bona fide biochemically stable protein assemblies. For example, although four separate yeast proteins (Tif4631, Tif4632, Caf20 and Eap1) share a common binding site for the translation-initiation factor eIF4E (Cdc33) (ref. 38)) — such that only one of these proteins can interact with eIF4E at any given time — these four proteins were reported as part of a single 'complex'.

Gavin et al. analysed the propensity of proteins to associate directly from raw data (without removal of contaminants), computing a composite index (the socio-affinity index) that represents the frequency with which two proteins were observed together as compared to their respective frequency of identification in the data set. In an attempt to capture the association of a given protein into multiple alternative assemblies, they performed matrix clustering to generate an initial list of complexes. After subtracting a penalty from the original values, clustering was repeated with the assumption that tight associations would not be drastically affected by the penalty, whereas weaker associations would be gradually lost and could be replaced by interactions not initially detected. After varying clustering parameters (including the number of iterations and the penalty value) and comparing the resulting hits to a manually curated group of known complexes, the best parameters were chosen, yielding 491 protein complexes, of which about half were previously unknown4.

An important difference from the Krogan et al. approach is that Gavin et al. identified 'complex isoforms' by comparing these 491 protein complexes to similar complexes detected (albeit with slightly poorer accuracy or coverage) from clustering with different parameters. Inclusion of 'complex isoforms' increased the coverage of components of known complexes, but also allowed the authors to partition proteins in complexes into two types: core components, which are present in most complexes, and attachments, which are present in only some. In addition, the authors also defined protein 'modules', in which two or more proteins are always found together, and these modules can be found with various other partners in multiple complexes. By doing so, Gavin et al. could in theory resolve the issue of a protein belonging to two complexes, as well as distinguish between essential and accessory components of a protein assembly. Although this approach was successful for the PP2A module, for which the density of information was sufficient, it was not successful for all complexes; for instance, both of the eIF4G isoforms (Tif4631 and Tif4632) were assigned to the same core complex with eIF4E, even though these associations are mutually exclusive.

It is to be expected that as these large data sets are combined and re-analysed (and as new data become available), the density of the data will increase and the sensitivity and accuracy of complex determination will improve. Collins et al. recently combined the Gavin and Krogan data sets and applied a new probabilistic method to filter out false positives in each of the data sets39. They found 9,074 high confidence interactions among 1,622 individual proteins, with an apparent error rate similar to the low-throughput data from MIPS. The Collins et al. scoring function has three components: direct evidence for an interaction following recovery of protein Y when protein X is a bait; indirect evidence of co-association, as was used by Gavin et al.; and evidence of non-association, when protein Y does not come down when protein X is used as bait. Combining the high-throughput data with this improved scoring function should help resolve protein complexes; however, this is easier said than done, and some mutually exclusive interactions (for example, those of Tif4631 and Tif4632 with eIF4E) still remain annotated as part of the same complex in the data set. In spite of these limitations, these studies have significantly improved our understanding of protein–protein interactions.

AP–MS and biochemical fractionation

As discussed above, the structure of multiprotein complexes can only be revealed indirectly through high-density AP–MS approaches. However, as described below, analysing the composition of an intact protein complex with defined biochemical properties can be used to directly reveal the composition of a given complex.

Biochemical fractionation in protein-complex analysis. Size fractionation (via gel filtration or density gradient), selective precipitation and ion-exchange chromatography have been used widely for the separation and enrichment of protein complexes40. AP of at least one of the sample components, using for example an inhibitor or a ligand, has also frequently been included in biochemical purification schemes to significantly increase enrichment factors. Depending on the nature of the particular protein complex, a combination of these separation methods can yield pure preparations. The composition of the semi-purified or enriched complex can then be determined by MS41, 42, 43, 44. For example, Moyer et al.43 purified a high molecular mass complex that contains the D. melanogaster DNA-replication protein CDC45 by performing serial chromatography steps, including gel filtration (Superdex), ion exchange (DEAE sepharose, Mono S and Mono Q) and affinity chromatography (heparin and anti-CDC45). The presence of CDC45 in each fraction was monitored by immunoblotting. Silver staining revealed 10 CDC45 interactors, which were identified by MS and immunoblotting to be components of the minichromosome maintenance-2 (MCM2)–7 and GINS complexes.

Although fractionation approaches have been used successfully for the characterization of the composition of numerous biologically relevant protein assemblies, they are not generic and must be tailored to a particular complex of interest. This limitation prevents their application to genome-wide studies. However, combining one or more of such biochemical fractionation techniques with a generic AP protocol (such as an epitope tag) can provide a surrogate for a complete biochemical isolation of a protein complex. For example, a combination of TAP or flag purification with standard gel filtration or ion exchange has allowed for a better characterization of several nuclear complexes45, 46, 47; this type of approach should be scalable to high-throughput projects. Emerging techniques such as free-flow electrophoresis (FFE)48 and blue native gel electrophoresis49, 50, 51 also offer great promise for the purification of intact multiprotein assemblies. In particular, blue native gels have been increasingly used to isolate protein complexes (including complexes from membranes51, 52, 53), and could easily be incorporated into an AP–MS strategy.

Guilt by association. Another strategy for the analysis of large multiprotein assemblies (or organelles) is to monitor co-fractionation profiles using quantitative MS and then to compare the acquired profiles with those of known components of the protein complex or organelle of interest. This can be accomplished by monitoring the number and intensity of the peptide signals for each detected protein across adjacent fractions (for example, throughout a sucrose or glycerol gradient). Using such an approach, Mann and colleagues identified novel components of the centrosome54 and have recently extended this approach to the description of components of multiple organelles from mouse liver55.

A related strategy for the identification of organellar components combines classical biochemical fractionation and affinity enrichment with stable-isotope-based quantitative proteomics (described in more detail below56). The use of stable isotopes allows for the measurement of enrichment ratios, which define the likelihood of a protein being in a particular complex or organelle. In addition to identifying new components of previously known complexes or organelles, groups of co-eluting proteins can identify novel organelles or protein complexes when no known component is available (for example, when the generated profiles are subjected to unsupervised clustering).

Crosslinking of protein complexes

Stabilization of interactions with crosslinkers. A problem that is encountered during the isolation of intact native protein complexes from cells or tissue is that only protein–protein interactions that are resistant to the lysis and purification conditions will survive to be detected by MS. Several different strategies have thus been devised to freeze transient or labile protein interactions by using chemical crosslinking reagents (Box 3). Crosslinkers possess at least two reactive groups that form covalent bonds with target molecules. These reactive groups are separated by a spacer arm of a defined length (usually in the range of 5–15 Å) that determines the maximal distance between two molecules. This confers some degree of specificity to the crosslinking process: molecules in close proximity are more likely to be crosslinked than distant species. However, protein–protein crosslinking techniques present multiple experimental and analytical challenges. The choice of crosslinker is crucial, as crosslinkers vary in cell-permeability, reactivity and spacer arm length. Crosslinking reaction conditions must also be closely monitored, such that bona fide protein–protein interactions are stabilized and undesired crosslinks (to contaminating proteins) are minimized.

Although many chemical crosslinkers can be used to stabilize complexes in theory, only a few have been successfully used for in vivo crosslinking followed by MS analysis. Formaldehyde and di-thiobis-succinimidyl-propionate (DSP; an amine-reactive, homobifunctional, thiol-cleavable and membrane-permeable crosslinker) have been used most often to identify novel interacting partners57, 58, 59, 60. Crosslinkers are also particularly attractive for revealing interactions that involve membrane proteins (or microsomes), as the detergent concentrations used to solubilize the membranes and extract the proteins typically also disrupt protein–protein interactions61. Mild formaldehyde crosslinking has also been performed in animals and has allowed for the stringent immunopurification of an intact gamma-secretase complex as well as the identification of protein interactors for the cellular prion protein PrPc (ref. 62). A combination of techniques integrating a modified TAP strategy, in vivo crosslinking with formaldehyde and an isotope-labelling strategy (Box 4; Fig. 3) was used to identify novel proteasome interactors63, 64. Improved variations of combined crosslinking, quantitative proteomics and AP–MS approaches will probably continue to be developed.

Figure 3 | Incorporation of stable isotopes into proteins.
Figure 3 : Incorporation of stable isotopes into proteins. Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, or to obtain a text description, please contact npg@nature.com

a | Metabolic labelling. In the stable-isotope labelling with amino acids in cell culture (SILAC) technique, one population of cells is maintained for several generations in growth media in which one or more naturally occurring 'light' amino acids have been replaced by their isotopically 'heavy' (usually containing 13C or 15N) counterparts. Heavy- and light-labelled samples are combined, and the proteins of interest are isolated using standard biochemical procedures. Protein mixtures are proteolyzed and subjected to mass spectrometry (MS). b | Chemical labelling. Chemical labelling is used after isolation, and often following proteolysis, of proteins of interest. Chemical labelling can therefore be used when SILAC-type labelling is difficult or impossible (for example, in human tissue biopsies). Chemically labelled peptides can be combined and then analysed by MS (often following an optional purification step).


Crosslinkers and structural studies. Several new strategies have also been developed to measure distance constraints through intra- and intermolecular crosslinking data (reviewed in ref. 65). As a low-resolution approach, crosslinking–MS is rapid, requires little material and allows structural analysis of complexes that are isolated under native conditions. A main challenge to the application of crosslinkers to structural determination is the identification of those pairs of peptides that are specifically crosslinked. This issue has recently been addressed by ingenious experimental designs and novel data analysis solutions. For example, several groups have devised approaches that use isotope-labelled crosslinkers66, 67, 68. The mass of a pair of peptides crosslinked with a heavy-isotope-labelled crosslinking reagent is larger than that of the same peptides crosslinked with the corresponding light-isotope-labelled compound. Peptides modified by light- and heavy-isotope-coded crosslinkers appear as doublets in mass spectra, whereas unmodified peptides appear as singlets; this difference facilitates the selection of the crosslinked peptides for further analysis.

Seebacher et al. introduced an additional step to distinguish monolinks from crosslinks68. The protein sample is prepared in a mixture of heavy and light water. The non-reacted end of a monolink (Box 3) is hydrolyzed and incorporates 18O/16O from the solvent, resulting in a splitting of the peaks68, and differentiating them from intra- or intermolecularly crosslinked peptides. This type of approach can be combined with data-analysis strategies to identify the crosslinked pairs and can establish distance constraints for structural modelling.

A new generation of crosslinkers, referred to as protein-interaction reporters (PIR), contain two labile bonds in the spacer domain. These bonds are broken by low-energy MS2 to yield a reporter ion, which facilitates the identification of spectra from crosslinked peptides; spectra that contain non-crosslinked peptides do not contain the reporter ion. Another advantage of the PIR approach is that the two crosslinked peptides are released following the cleavage of the labile crosslinker and can be easily identified using standard database-search algorithms69. Future use of such reagents holds great promise for obtaining structural information of proteins in a complex.

Complex stoichiometry

Another issue which has not yet been addressed in high-throughput studies concerns the stoichiometry of complex components — this is important information for understanding the structural organization of a protein assembly. In recent years, mass spectrometers have been modified for the analysis of protein assemblies in the MDa range, and conditions have been developed to maintain intact protein complexes in the gas phase70. Measurement of the intact mass of a protein complex, as well as the exact masses of each of its components, can thus be used to determine the stoichiometry of each component. This strategy was used to decipher the heptameric nature of the stalk complex dissociated from bacterial ribosomes71 as well as interactions in complicated multiprotein structures, such as the proteasome 19S lid72 and the yeast exosome73. Importantly, such an approach is compatible with AP by TAP tagging73. Because this approach is compatible with many purification strategies and because the parameters of operation of the mass spectrometer can be manipulated to partially disrupt the protein complex, this technique also has the potential to shed light on subcomplexes or weakly associated binding partners (reviewed in Ref. 74).

Another promising strategy to determine protein stoichiometry in a complex is to combine complex isolation with isotope-based absolute quantitative proteomics (Box 4). If all of the components of a complex are known, synthetic tryptic peptides can be generated to monitor the abundance of each of the proteins in the complex. These peptides can be synthesized with heavy isotopes and then mixed with an unlabelled sample (as in the AQUA approach75 or the QCAT strategy76) or labelled in parallel to the samples (for example by reaction with isobaric tags for relative and absolute quantification (iTRAQ)77 or other amino-reactive reagents). This approach was successfully used to confirm the 1:1:1 stoichiometry of the human spliceosomal U1 small nuclear ribonucleoprotein78. It should be noted that this method is dependent on the isolation of a single homogeneous complex, which can be difficult to achieve using common isolation methods.

Complex dynamics

Most of the high-throughput AP–MS data generated so far represent a static view of protein complexes. Owing to the dynamically changing composition of many complexes — particularly those involved in cell signalling — it has been crucial to develop strategies to capture regulated interactions. Quantitative proteomics approaches are ideally suited for this task. One effective way to generate quantitative information is to incorporate stable isotopes into peptides. This is typically accomplished through metabolic labelling, in which isotopically 'heavy' compounds replace the natural 'light' isotopes in growth medium or through the addition of an isotopic label after lysis (and often after purification) using chemical or enzymatic reactions79, 80 (Fig. 3).

Metabolic labelling with SILAC. In the stable-isotope labelling with amino acids in cell culture (SILAC) procedure81, cells are grown in culture for several generations in the presence of an isotopically heavy amino acid, thereby replacing essentially all of the naturally occurring light amino acid. Peptides derived from labelled cells behave identically to the corresponding peptide from unlabelled cells throughout any biochemical enrichment or purification steps. However, the difference in mass between the light and heavy peptides can be measured in the survey scan of the mass spectrometer. The relative intensities of the MS peaks are proportional to the abundance of the peptides in the two samples. SILAC has been successfully used in several recent studies (see Refs 17,18,81–83), including for the profiling of the dynamic association of proteins isolated via an anti-phosphotyrosine resin following stimulation with growth factors, such as epidermal growth factor (EGF) or platelet-derived growth factor (PDGF), in mesenchymal stem cells17.

Chemical labelling with ICAT reagents. Quantitative approaches based on isotopic labelling of proteins (or peptides) after lysis using chemical reactions can also be used for detecting changes in complex composition. Chemical labelling approaches are complementary to SILAC and are particularly useful for measuring dynamic changes in complexes isolated from tissues or organisms that cannot be metabolically labelled. Isotope-coded affinity tags84 (ICAT) are typical of this type of approach and are comprised of an affinity handle (often biotin), an isotopic linker (often cleavable) and a protein reactive group (for example, a thiol- or amine-reactive group). ICAT reagents have been used in many quantitative proteomics studies (reviewed in refs 79,85). For example, the ICAT approach uncovered a novel tenth subunit of the TFIIH transcription complex; mutation of this subunit is responsible for the DNA-repair syndrome trichothiodystrophy group A20, 86. ICAT has also been applied to the study of dynamically regulated associations, for example to determine changes in transcription-factor complexes during erythroid-cell differentiation22. In addition to the original ICAT reagents, various different types of chemical isotope-labelling reagents, targeting various different reactive groups, have now been developed.

Multiplex isobaric tags. A newer generation of labelling reagent, iTRAQ77, is particularly well suited for resolving complex dynamics because it allows multiplexed quantitative analysis of four different samples (Fig. 4). iTRAQ reagents are amine reactive and are therefore covalently linked to the N terminus of every peptide (as well as to the e-amine of lysine residues). In contrast to SILAC and ICAT labelling, in which the heavy and light labels alter the parent peptide masses, iTRAQ reagents are isobaric and are therefore not quantified in the survey scan. Instead, iTRAQ reagents fragment during the tandem MS (MS/MS) step, yielding a single low-mass reporter ion (at 114, 115, 116 or 117 Da for the four-plexed reagent set). Intensities of each of the reporter ions are used to calculate the relative abundance of each of the peptides in the samples. The remaining peaks in the MS/MS spectrum are used for sequence identification. So, a single MS/MS spectrum can yield both quantitative information and can identify a peptide. Temporal profiles can be reconstructed from the purification of complexes at different time points following a specific treatment and the labelling of the derived peptides with a single iTRAQ reagent. iTRAQ technology has been used to analyse dynamically regulated tyrosine phosphorylation87 and to characterize the carbon-source dependency of the Snf1 kinase complex in Candida albicans88.

Figure 4 | Isobaric tags to elucidate complex formation dynamics.
Figure 4 : Isobaric tags to elucidate complex formation dynamics. Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, or to obtain a text description, please contact npg@nature.com

a | Desired treatment of cells is followed by isolation of protein complexes and proteolysis. Isobaric tags (iTRAQ) are chemically added to the N terminus of every peptide (as well as to lysine epsilon-amine groups). Samples from multiple treatment time points are combined and subjected to analysis. b | A peptide labelled with the iTRAQ 114 and iTRAQ 117 reagents. iTRAQ is isobaric, such that addition of the 114-Da or 117-Da mass tags alter the mass of a given peptide by the same amount. To maintain a constant mass, the reporter moiety (for example, of mass 114) is separated from the peptide by a balancer group. The reporter and balancer groups fragment in the collision cell of the mass spectrometer during the tandem mass spectrometry (MS/MS) event, and the intensity of the reporter ions is monitored. c | Analysis of an iTRAQ experiment. MS/MS analysis of a labelled peptide generates a fragmentation spectrum that yields the sequence of the peptide. The iTRAQ reagent is fragmented in the same step and reporter ions are quantified by magnifying the low mass range (114–117) area. In the example shown, protein B associates with protein A (the bait) after 30 and 60 minutes of stimulation, but not after 120 minutes of treatment. m/z, mass/charge ratio.


Which method to choose? We have highlighted only a subset of the quantitative proteomics techniques currently available; each of the techniques has strengths and limitations, and the choice of the proper technique depends on the question being asked. The first choice to be made concerns the use of metabolic versus chemical labelling. SILAC can be used only with cells grown in culture for several generations, but has the advantage of introducing the isotopic label before purification, thereby reducing errors due to sample handling. Chemical labelling (ICAT and iTRAQ) is not as limited in applicability, but the efficiency of the labelling reaction can be problematic in some cases.

A second decision point regards the use of standard isotopic versus isobaric labelling reagents: combining SILAC or ICAT samples prior to MS analysis results in an increase in sample complexity (proportional to the number of isotopic variants used), a problem that does not occur in iTRAQ experiments. However, whereas iTRAQ quantification is extremely powerful for direct comparison of up to four samples in a single experiment, comparison across different experiments requires that the peptide of interest be fragmented in each data set. The same is not the case in SILAC- or ICAT-based experiments, in which quantification at the MS level can be obtained even if the peptide was not selected for fragmentation.

Although isotopic labelling approaches have clearly been at the forefront of the quantitative proteomics field, label-free approaches are also being developed, and these might change the way researchers perform quantitative experiments. These represent a promising alternative in situations in which metabolic labelling is not feasible, and changes in protein-complex components must be measured with high sensitivity. For example, Rinner et al.89 recently applied a label-free strategy on the basis of the computational alignment of MS spectra across different samples (MasterMap) to capture insulin-mediated changes in the formation of human FOXO3A complexes.

Conclusions

We have presented the general principles of the AP–MS approach and have highlighted some recent successes in deciphering the composition of protein complexes. Although AP–MS was traditionally considered a low-throughput method, ground-breaking studies in S. cerevisiae have demonstrated the power of this approach in high-throughput interactome studies. One crucial issue regarding the interpretation of AP–MS data is that a single protein can be present in multiple distinct complexes. Purification of a single protein can thus result in a mixture of different unique protein–protein assemblies. We have discussed methods to characterize individual protein complexes by AP–MS either indirectly, through high-density data acquisition, or directly, through the coupling of AP–MS with biochemical fractionation.

AP–MS can also be combined with a number of different approaches to dramatically increase information content. For example, the use of crosslinkers in AP–MS design was shown to have two advantages: stabilizing weak or transient interactions and providing important clues regarding the architecture of the protein complex. Structural studies of complexes will also be facilitated by new strategies to determine the stoichiometry of each of the components of a complex, either through intact mass measurements (of the complex and its constituents) or through absolute quantitative proteomics. Last, the modern generation of quantitative proteomics techniques truly opens the door for studies of protein-complex dynamics. This is an exciting area of research — all high-throughput maps that have been generated so far have been static; however, by using quantitative proteomics methods, dynamic protein–protein interaction maps will be generated in the future, allowing us to discover how proteins are reorganized following intracellular signalling events.

Top

Acknowledgements

Funding was provided in part by the Terry Fox Foundation to A.-C.G., the Canada Institutes of Health Research to B.R. and the National Heart, Lung, and Blood Institute, National Institute of Health contract N01-HV-28179 to R.A. A.-C.G. and B.R. are recipients of Canada Research Chairs, Canadian Foundation for Innovation and Ontario Research Fund grants.

Competing interests statement

The authors declare no competing financial interests.

Top

References

  1. Ito, T. et al. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc. Natl Acad. Sci. USA 98, 4569–4574 (2001).

  2. Krogan, N. J. et al. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature 440, 637–643 (2006).
    One of the two most comprehensive AP–MS studies in yeast to date (see REF. 4) using TAP to generate a high-confidence interaction network and define protein complexes.

  3. Ho, Y. et al. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415, 180–183 (2002).

  4. Gavin, A. C. et al. Proteome survey reveals modularity of the yeast cell machinery. Nature 440, 631–636 (2006).
    One of the two most comprehensive AP–MS studies in yeast (see REF. 2). It introduced the notion of the socio-affinity index to define the propensity of proteins to associate into specific complexes.

  5. Gavin, A. C. et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415, 141–147 (2002).

  6. Uetz, P. et al. A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae. Nature 403, 623–627 (2000).

  7. Formstecher, E. et al. Protein interaction mapping: a Drosophila case study. Genome Res. 15, 376–384 (2005).

  8. Giot, L. et al. A protein interaction map of Drosophila melanogaster. Science 302, 1727–1736 (2003).

  9. Stanyon, C. A. et al. A Drosophila protein-interaction map centered on cell-cycle regulators. Genome Biol. 5, R96 (2004).

  10. Li, S. et al. A map of the interactome network of the metazoan C. elegans. Science 303, 540–543 (2004).

  11. Stelzl, U. et al. A human protein–protein interaction network: a resource for annotating the proteome. Cell 122, 957–968 (2005).

  12. Rual, J. F. et al. Towards a proteome-scale map of the human protein–protein interaction network. Nature 437, 1173–1178 (2005).

  13. Gandhi, T. K. et al. Analysis of the human protein interactome and comparison with yeast, worm and fly interaction datasets. Nature Genet. 38, 285–293 (2006).

  14. Fields, S. & Sternglanz, R. The two-hybrid system: an assay for protein–protein interactions. Trends Genet. 10, 286–292 (1994).

  15. von Mering, C. et al. Comparative assessment of large-scale data sets of protein–protein interactions. Nature 417, 399–403 (2002).

  16. Parrish, J. R., Gulyas, K. D. & Finley, R. L. Jr. Yeast two-hybrid contributions to interactome mapping. Curr. Opin. Biotechnol. 17, 387–393 (2006).

  17. Kratchmarova, I., Blagoev, B., Haack-Sorensen, M., Kassem, M. & Mann, M. Mechanism of divergent growth factor effects in mesenchymal stem cell differentiation. Science 308, 1472–1477 (2005).

  18. Blagoev, B. et al. A proteomics strategy to elucidate functional protein–protein interactions applied to EGF signaling. Nature Biotechnol. 21, 315–318 (2003).

  19. Himeda, C. L. et al. Quantitative proteomic identification of six4 as the trex-binding factor in the muscle creatine kinase enhancer. Mol. Cell. Biol. 24, 2132–2143 (2004).

  20. Ranish, J. A. et al. Identification of TFB5, a new component of general transcription and DNA repair factor IIH. Nature Genet. 36, 707–713 (2004).

  21. Ranish, J. A. et al. The study of macromolecular complexes by quantitative proteomics. Nature Genet. 33, 349–355 (2003).

  22. Brand, M. et al. Dynamic changes in transcription factor complexes during erythroid differentiation revealed by quantitative proteomics. Nature Struct. Mol. Biol. 11, 73–80 (2004).

  23. Terpe, K. Overview of tag protein fusions: from molecular and biochemical fundamentals to commercial systems. Appl. Microbiol. Biotechnol. 60, 523–533 (2003).

  24. Domon, B. & Aebersold, R. Mass spectrometry and protein analysis. Science 312, 212–217 (2006).

  25. Nesvizhskii, A. I. Protein identification by tandem mass spectrometry and sequence database searching. Methods Mol. Biol. 367, 87–120 (2006).

  26. Aebersold, R. & Mann, M. Mass spectrometry-based proteomics. Nature 422, 198–207 (2003).

  27. Steen, H. & Mann, M. The ABC's (and XYZ's) of peptide sequencing. Nature Rev. Mol. Cell Biol. 5, 699–711 (2004).

  28. Domon, B. & Aebersold, R. Challenges and opportunities in proteomic data analysis. Mol. Cell. Proteomics 5, 1921–1926 (2006).

  29. Mikesh, L. M. et al. The utility of ETD mass spectrometry in proteomic analysis. Biochim. Biophys. Acta 1764, 1811–1822 (2006).

  30. Nesvizhskii, A. I. et al. Dynamic spectrum quality assessment and iterative computational analysis of shotgun proteomic data: toward more efficient identification of post-translational modifications, sequence polymorphisms, and novel peptides. Mol. Cell. Proteomics 5, 652–670 (2006).

  31. Pedrioli, P. G. et al. Automated identification of SUMOylation sites using mass spectrometry and SUMmOn pattern recognition software. Nature Methods 3, 533–539 (2006).

  32. Janssens, V., Goris, J. & Van Hoof, C. PP2A: the expected tumor suppressor. Curr. Opin. Genet. Dev. 15, 34–41 (2005).

  33. Goldberg, Y. Protein phosphatase 2A: who shall regulate the regulator? Biochem. Pharmacol. 57, 321–328 (1999).

  34. Kong, M. et al. The PP2A-associated protein alpha4 is an essential inhibitor of apoptosis. Science 306, 695–698 (2004).

  35. Chen, J., Peterson, R. T. & Schreiber, S. L. alpha4 associates with protein phosphatases 2A, 4, and 6. Biochem. Biophys. Res. Commun. 247, 827–832 (1998).

  36. Inui, S. et al. Ig receptor binding protein 1 (alpha4) is associated with a rapamycin-sensitive signal transduction in lymphocytes through direct binding to the catalytic subunit of protein phosphatase 2A. Blood 92, 539–546 (1998).

  37. Gingras, A. C. et al. A novel, evolutionarily conserved protein phosphatase complex involved in cisplatin sensitivity. Mol. Cell. Proteomics 4, 1725–1740 (2005).

  38. Cosentino, G. P. et al. Eap1p, a novel eukaryotic translation initiation factor 4E-associated protein in Saccharomyces cerevisiae. Mol. Cell. Biol. 20, 4604–4613 (2000).

  39. Collins, S. R. et al. Towards a comprehensive atlas of the physical interactome of Saccharomyces cerevisiae. Mol. Cell. Proteomics 6, 439–450 (2007).

  40. Abelson, J. N. & Simon, M. I. (eds) Guide to protein purification (Academic Press, 1990).

  41. Weiner, O. D. et al. Hem-1 complexes are essential for Rac activation, actin polymerization, and myosin regulation during neutrophil chemotaxis. PLoS Biol. 4, e38 (2006).

  42. Panigrahi, A. K. et al. Association of two novel proteins, TbMP52 and TbMP48, with the Trypanosoma brucei RNA editing complex. Mol. Cell. Biol. 21, 380–389 (2001).

  43. Moyer, S. E., Lewis, P. W. & Botchan, M. R. Isolation of the Cdc45/Mcm2–7/GINS (CMG) complex, a candidate for the eukaryotic DNA replication fork helicase. Proc. Natl Acad. Sci. USA 103, 10236–10241 (2006).
    A nice example of classical biochemical fractionation combined with AP–MS. CDC45 interactors were identified as components of the Mcm2–7 and GINS complexes.

  44. Poot, R. A. et al. HuCHRAC, a human ISWI chromatin remodelling complex contains hACF1 and two novel histone-fold proteins. EMBO J. 19, 3377–3387 (2000).

  45. Mueller, C. L. & Jaehning, J. A. Ctr9, Rtf1, and Leo1 are components of the Paf1/RNA polymerase II complex. Mol. Cell. Biol. 22, 1971–1980 (2002).

  46. Lindstrom, D. L. et al. Dual roles for Spt5 in pre-mRNA processing and transcription elongation revealed by identification of Spt5-associated proteins. Mol. Cell. Biol. 23, 1368–1378 (2003).

  47. Ducut Sigala, J. L. et al. Activation of transcription factor NF-kappaB requires ELKS, an IkappaB kinase regulatory subunit. Science 304, 1963–1967 (2004).

  48. Weber, G. & Bocek, P. Recent developments in preparative free flow isoelectric focusing. Electrophoresis 19, 1649–1653 (1998).

  49. Lasserre, J. P. et al. A complexomic study of Escherichia coli using two-dimensional blue native/SDS polyacrylamide gel electrophoresis. Electrophoresis 27, 3306–3321 (2006).

  50. Schagger, H. & von Jagow, G. Blue native electrophoresis for isolation of membrane protein complexes in enzymatically active form. Anal Biochem. 199, 223–231 (1991).

  51. Camacho-Carvajal, M. M., Wollscheid, B., Aebersold, R., Steimle, V. & Schamel, W. W. Two-dimensional Blue native/SDS gel electrophoresis of multi-protein complexes from whole cellular lysates: a proteomics approach. Mol. Cell. Proteomics 3, 176–182 (2004).
    Example of the use of blue native electrophoresis combined with MS for the study of multiprotein complexes.

  52. Nijtmans, L. G., Henderson, N. S. & Holt, I. J. Blue native electrophoresis to study mitochondrial and other protein complexes. Methods 26, 327–334 (2002).

  53. Fandino, A. S. et al. LC-nanospray–MS/MS analysis of hydrophobic proteins from membrane protein complexes isolated by blue-native electrophoresis. J. Mass Spectrom. 40, 1223–1231 (2005).

  54. Andersen, J. S. et al. Proteomic characterization of the human centrosome by protein correlation profiling. Nature 426, 570–574 (2003).
    Description of the protein correlation profiling strategy and its use in the identification of novel components of the centrosome. This approach can also be applied to other organelles.

  55. Foster, L. J. et al. A mammalian organelle map by protein correlation profiling. Cell 125, 187–199 (2006).

  56. Marelli, M. et al. Quantitative mass spectrometry reveals a role for the GTPase Rho1p in actin organization on the peroxisome membrane. J. Cell Biol. 167, 1099–1112 (2004).

  57. Kim, D. H. et al. mTOR interacts with raptor to form a nutrient-sensitive complex that signals to the cell growth machinery. Cell 110, 163–175 (2002).

  58. Vasilescu, J., Guo, X. & Kast, J. Identification of protein–protein interactions using in vivo cross-linking and mass spectrometry. Proteomics 4, 3845–3854 (2004).
    Example of the use of crosslinking (with formaldehyde) coupled to AP–MS to identify protein interactors for an activated form of myc-tagged Ras.

  59. Meunier, L., Usherwood, Y. K., Chung, K. T. & Hendershot, L. M. A subset of chaperones and folding enzymes form multiprotein complexes in endoplasmic reticulum to bind nascent proteins. Mol. Biol. Cell 13, 4456–4469 (2002).