Main

Proteomics is defined as the study of the proteome. As a term analogous to the genome, the proteome is defined as the total protein complement of an organelle, cell, tissue or an entire organism.1, 2, 3 Proteomics is a multi-faceted endeavor to study protein expression and post-translational modification, and protein interactions, organization and function at a global level. Why study the proteome in disease? While the cause of a disease aberration may be at the genetic level, the functional consequences of such an aberration are expressed at the protein level. Hence, disruption of protein structure, function or interaction is the underlying mechanism of the majority of diseases. Further, several questions relating to the fundamental basis of diseases are best addressed at the protein level because they occur as post-translational events, or relate to abnormal regulation of protein function, or aberrant protein–protein or protein–DNA interactions. Additionally, covalent modification of proteins (undetectable by genomic studies) could confer abnormal activities to proteins critical to cell growth, survival, proliferation and apoptosis.4, 5

Although the field has progressed rapidly in recent years, there are significant technical challenges which pose limitations to the application of mass spectrometry to the average pathologist–investigator. Despite these challenges, proteomic studies have yielded unparalleled information and understanding of the cellular biology of diseased states. The application of mass spectrometry to the study of diseases will ultimately lead to identification of biomarkers that are critical for the detection, diagnosis, monitoring, prognosis and treatment of specific disease entities. In this review, we provide an overview of the technical aspects involved in mass spectrometry-based proteomics, and illustrate some applications of these methods and technologies to the study of two distinct categories of lymphoid neoplasia.

Principles and tools for proteomics

Samples for Proteomic Analysis

Protein isolation

Most proteomic studies require proteins extracted from fresh or snap frozen cells, for example, biological fluids, such as serum, urine or tissue samples. Owing to cross-linking and degradation of proteins, paraffin-embedded formalin-fixed tissues are not ideal sources for proteomic studies. Nevertheless, proteins extracted from ethanol-fixed paraffin-embedded tissues can be utilized for two-dimensional gel electrophoresis–mass spectrometry (2D-MS) analysis.6 Cellular proteins have to be isolated from samples containing other biological molecules including carbohydrates, lipids and nucleic acids. Thus, protein extraction protocols entail; the homogenization of cells and tissues followed by application of detergents such as 3-(dimethylammonio])-1 propane sulfate (CHAPS),7 Tween and sodium dodecyl sulfate (SDS), which help to dissolve the proteins and separate them from the lipid components, reducing agents such as dithiothreitol (DTT); denaturing agents such as urea which disrupt the bonds that are responsible for the formation of secondary and tertiary conformational structure; and enzymes which degrade nucleic acids such as DNAses and RNAses. Tissues obtained from laser capture microdissection of human colon cancer8 and hepatocellular carcinoma have been successfully used for 2D gel electrophoresis followed by MALDI-TOF mass spectrometry analysis and 2D liquid chromatography tandem mass spectrometry.9

Separation of complex proteins into simpler components

Several techniques are utilized for the analytical separation of proteins. Figure 1 illustrates different modalities commonly used for separation of proteins from complex mixtures. These include; one-dimensional gel electrophoresis (1D-GE) which achieves resolution of cellular proteins based on molecular weight, two-dimensional gel electrophoresis (2D-GE), which involves initial separation of proteins based on isoelectric point (pI) followed by subsequent separation based on size; high-performance liquid chromatography (HPLC); ion exchange and different types of affinity chromatography.10, 11 The most powerful strategy entails the integration of the different protein and peptide separation methods as multidimensional combinations. In this regard, ion-exchange liquid chromatography (LC) in tandem with reverse-phase (RP)-HPLC is a powerful tool for resolving complex peptide mixtures, and has been automated to maximize efficiency.12, 13, 14, 15 When used with a 100 μm 2D strong cationic exchange-reverse-phase (SCX-RP) packed microcapillary, the MS system is capable of achieving a detection limit of 10 fmol for identifying a single tryptically digested protein.16 The sensitivity of this technique (multidimensional protein identification technology, MudPIT, see Figure 2) is very attractive for the study of complex proteomes such as mammalian cellular samples. MudPIT has been applied to large-scale protein characterization and identification of up to 1484 proteins from yeast in a single experiment.13

Figure 1
figure 1

Overview of experimental design for mass spectrometry-based proteomic studies. Proteins are extracted from biologic samples and fractionated by a variety of separation methods. In 1D gel electrophoresis, proteins are separated by size. In 2D gel electrophoresis, proteins are separated based on isoelectric point (pI) and size. In multidimensional liquid chromatography, digested proteins are fractionated by 2D (strong cationic exchange (SCX) and reverse phase (RP)) or 3D (strong cationic exchange, avidin and reverse-phase) liquid chromatography. In the fourth method, proteins are separated based on functional properties according to physical, chemical or biochemical properties in surface-enhanced laser-desorption ionization (SELDI) technology.

Figure 2
figure 2

Multidimensional protein identification technology (MudPIT). Complex peptide mixtures from fractions obtained from whole cell lysates are separated onto a microcapillary column packed with strong cation exchange (SCX) and reverse-phase (RP) resins. Peptides are directly eluted into the tandem mass spectrometer via direct interfacing with the microcapillary high-pressure liquid chromatography (HPLC) column. The coupling of separation techniques allow for high-throughput, on-line analysis of peptide or protein mixtures.

Proteins isolated from gels or individual chromatography fractions can be subjected to proteolytic cleavage by enzymes such as trypsin with specific cleavage sites (at the carboxy-terminal of lysines or arginines), or nonspecific cleavage specificity such as elastase or subtilisin.3, 15

General Strategy for Mass Spectrometry-Based Proteomics

Proteomic studies require the simplification of a complex mixture of proteins into less complex components that are more amenable for analysis. In this regard, intact proteins with different biophysical characteristics such as molecular weight, hydrophobicity, and post-translational modifications may be present within a complex mixture intended for analysis. In ‘top-down’ proteomics, intact proteins are analyzed. In ‘bottom-up’ proteomics, the proteins are proteolytically cleaved using enzymes with or without cleavage specificity.

The development of sensitive instruments capable of analyzing larger biologic molecules such as proteins has greatly facilitated the analysis of the total complement of proteins in cells and tissues. Mass spectrometers measure the mass of the smallest of molecules with very high accuracy, and hence mass spectrometry can be considered as the smallest weighing scale. In parallel with the technological advancements in mass spectrometers, technological improvements in ionization methods have also enhanced the ability to analyze complex biologic molecules by mass spectrometry.4, 17 The final component that has greatly impacted the ability to conduct proteomic studies is development of translated genomic databases and specialized software algorithms that rapidly search mass spectrometric data against known or predicted proteins within the databases.4

In general, the measurement of fragmented peptide masses by mass spectrometry is more accurate than measuring the mass of intact proteins. Thus in ‘bottom-up’ proteomics, the typical work-flow involves initial simplification of a complex protein mixture followed by digestion into peptides which are subjected to mass spectrometric analysis. The mass spectrometric data are then analyzed using specialized software algorithms that identify the proteins from which the peptide sequences are derived. The ability to accurately determine the mass of a unique peptide that originates from a particular protein greatly facilitates the identification of that protein. In essence, protein identification centers on the fact that a peptide sequence, composed of six amino-acid residues or greater, provides a unique opportunity for identification of the parent protein. This is because the probability that any one amino acid would occupy a particular position within a peptide sequence is 1/20. For a sequence of six amino-acid residues, the theoretical probability is 1 in 206=1 in 64 000 000. For a number of reasons, even these odds may be insufficient to unequivocally identify a protein from a single peptide. Identification of overlapping or longer peptide sequences with multiple ‘hits’ to a particular protein provides an even greater degree of certainty in the identification of the protein. Thus in many cases, it is possible to utilize database searches to identify a protein from only a few peptides.

Mass Spectrometers

A mass spectrometer is typically composed of three components: an ionization source, the mass analyzer and the detector. The ionization source creates ions from the sample to be analyzed. The mass analyzer resolves the ions by their mass-to-charge ratio (m/z). The detector determines the mass of the ions. The most frequently utilized ionization sources and mass analyzers are discussed below. See Figure 3 for a schematic representation of mass spectrometers.

Figure 3
figure 3

Schematic representation of common protein ionization methods used for mass spectrometry-based proteomics. Two common ionization technologies are currently available for analysis of proteins. Electrospray ionization (ESI) volatilizes and ionizes peptides and proteins in solution and is coupled to two mass analyzers resulting in tandem MS. Matrix-assisted laser-desorption ionization (MALDI) uses analytes which are cocrystallized in a matrix composed of organic acid on a solid support. A pulse of ultraviolet laser is then used to evaporate the matrix and the analyte into gas phase resulting in generation of single charged ions.

Ionization Source

MALDI

The recent development of so-called ‘soft’ (low-energy) ionization techniques such as matrix-assisted laser desorption/ionization (MALDI)18, 19 and electrospray (ESI) techniques17 have dramatically enhanced the possibility to analyze larger biomolecules in general, and proteins in particular by mass spectrometry. In MALDI, the sample intended for analysis is incorporated into a chemical matrix containing compounds such as 3,5-dimethoxy-4-hydroxycinnamic acid (sinapinic acid). Laser activation of the target by a laser within the ion source leads to the release from the target of peptide/protein ions into gas phase. More recently, a variation on the MALDI concept has been introduced; namely, surface-enhanced laser desorption/ionization (SELDI).20, 21 This latter format is embodied in the Ciphergen Chip™ system and is composed of several chip matrices, which exploit the differing biophysical and chromatographic characteristics of the different proteins for their preferential selection. The different surfaces include among others, a hydrophobic surface, a strong anionic exchange surface, and immobilized metallic ion with a strong affinity for phosphorylated proteins.

Electrospray ionization

In contrast to the MALDI wherein ions are generated from a solid matrix, electrospray ionization involves the generation of peptide ions from aqueous solution.17 The solution containing the sample passed through a needle subjected to a high voltage. The solution stream is ejected from the needle orifice as a spray of droplets. The solvent is eliminated from the droplets by a heated capillary or an inert gas. Solutions with acidic pH favor protonation of the N-terminal amines and histidine nitrogens, and peptide fragmentation is facilitated when the peptide ions are positively charged. Thus, ESI protocols commonly include acidification steps prior to peptide ion analysis in the mass analyzer.

Mass Analyzers

Mass analyzers measure the mass-to-charge (m/z) ratio of gas-phase ions generated from the ionization source. Several types of mass analyzers are utilized for proteomic analysis. Examples include quadrupoles, ion traps, time-of-flight and Fourier transform ion cyclotron resonance (FTICR) type analyzers. In addition, hybrid instruments incorporating one type of analyzer with another (eg quadrupole-TOF) have also been utilized with great success.

Important performance characteristics that are relevant to mass analyzers are mass accuracy, mass resolution, and mass range. Mass accuracy refers to the extent to which a mass analyzer reflects the ‘true’ m/z values, and is measured in atomic mass units (amu) (parts per million (ppm) (eg 500 ppm) or percent accuracy (eg 0.05% accuracy). By comparison, mass resolution indicates the ability of the instrument to discriminate between ions with different m/z values. It is defined by the equation M/ΔM, where M is the m/z ratio of a mass peak and ΔM is the full width of a peak at half its maximum height. The mass resolution of instrument often correlates with the accuracy of the instrument. High end mass spectrometers incorporate mass analyzers that employ variety of designs including reflectron time-of-flight and FTICR which exhibit exceptional mass accuracy and resolution. Mass range refers to the optimal m/z range at which a mass analyzer operates, and varies with the type of mass analyzer. For example, quadrupole mass analyzers exhibit a mass range of up to 4000 m/z, and time-of-flight analyzers may have mass ranges extending up to 100 000 and beyond. Below, we briefly discuss the principles behind the operation of common analyzers that are utilized for proteomic mass spectrometry (illustrated in Figure 4).

Figure 4
figure 4

Common types of mass analyzers used for proteomic studies. In quadrupole mass analyzers, ions are deflected into four parallel poles or rods which have fixed direct current and alternating radiofrequency potentials are applied to them (a). In time-of-flight mass analyzers (b), the ions are accelerated linearly down a flight tube until they impact a detector at the opposite end. The lighter ions will travel faster and reach the detector ahead of the heavier slower ions. The ion-trap mass analyzers (c) consist of the ring electrode, the entrance endcap electrode and the exit endcap electrode. They are unique in their ability to trap ions in a three-dimensional electrical field. The electrodes form a cavity in which ions are trapped and analyzed.

Quadrupoles

Quadrupole mass analyzers utilize electrical fields to separate ions based on their m/z values. A quadrupole is composed of four parallel poles or rods which have a fixed direct current and alternating radiofrequency potentials applied to them. Ions are either deflected into the poles or focused on the detector depending on the electrical field generated. Alteration of the electrical field frequencies and strengths would lead to the detection of different ions. The mass range and resolution that are achievable using the quadrupole assembly are determined principally by the physical dimensions (length and diameter) of the poles (Figure 4a). Quadrupole mass analyzers interface very well with ESI because quadrupole analyzers are tolerant of the relatively high pressures that are produced by ESI sources. Additionally, ESI generates multiply charged ions which widen the range of the m/z values of the ions to be detected by the mass analyzer. Since quadrupoles exhibit a wide mass range of detection, they are especially well-suited to the detection of ions produced by ESI. This complementary relationship has been exploited for the design of several instruments, wherein ESI and quadrupole mass analyzers have been combined in commercial mass spectrometers. The two kinds of quadrupole mass spectrometers that are made for analytical work are single-stage and triple quadrupoles. The single-stage instruments have only limited application in proteomics studies because of limited mass resolution and lack of tandem mass spectrometry capability. By comparison, the triple quadrupoles instruments comprising three quadrupole analyzers in tandem are capable of true tandem mass spectrometry, and have been successfully utilized for a number of proteomics applications including product ion, precursor ion, and neutral loss scanning. These features are particularly the latter, are very useful in the analysis of post-translational modifications.22, 23, 24, 25

Time-of-flight

In time-of-flight mass analyzers, the ions from the ion source are accelerated linearly down a flight tube until they impact a detector at the other end of the tube. Given the fact that the ions contain the same amount of energy but have different masses, the lighter ions travel faster and reach the detector ahead of the heavier (slower) ions. The relationship between the mass and velocity of the ions is given by the equation KE=1/2 mv2, where (KE) represents the kinetic energy, (m) represents the mass and (v) represents the velocity. Hence, TOF analyzers derive their name from the concept that the ‘time of flight’ (t) of ions is related both to their m/z ratio and velocity (v) within a fixed distance (d) as expressed in the equation t=d/v. Although TOF analyzers have been frequently coupled with MALDI ion sources, some recently developed mass spectrometers have combined ESI with TOF. While linear mode TOF analyzers do not provide the best resolution, the ‘reflectron’ analyzers (Figure 4b) overcome this shortcoming and provide excellent resolution. For proteomics applications, the quadrupole time-of-flight (QqTOF) hybrid instruments with their superior mass accuracy, mass range and mass resolution are of much greater utility than simple TOF instruments and are very popular.22, 26

Ion-trap

The distinctive feature of ion trap mass analyzers is their ability to trap ions in a three-dimensional electrical field. The ion trap consists of a ring electrode and end cap electrodes (Figure 4c). The ion trap differ significantly in design and in operation from the quadrupoles in that, while triple quadrupoles perform tandem mass analysis on peptide ions as they pass through the analyzer, ion traps are capable of isolating and retaining specific ions for fragmentation upon collision with an inert gas in the same cell. The discovery of the mass-selective instability mode transformed the ion trap from a storage device into a powerful instrument capable of sequential ejection of selected or all ions out of the trapping cell to a detector.27 The ion trap instruments are quite versatile, interface readily with liquid chromatography, ESI4 and more recently, MALDI28 and are very popular for many applications in proteomics.

Figure 5 illustrates the differential sensitivities characteristically observed for ESI, NanoESI and MALDI mass spectrometry. ESI and nanoESI have similarly high mass ranges, while ESI exhibits lower sensitivity. MALDI allows a high sensitivity as well as high practical mass range.

Figure 5
figure 5

Typical sensitivity and mass ranges characteristic of different ionization techniques. Electrospray ionization (ESI) and nanoelectrospray ionization (nanoESI), have similarly high mass ranges, while ESI has lower sensitivity. Matrix-assisted laser desorption ionization (MALDI) allows a high sensitivity as well as high practical mass range.

Fourier transform ion cyclotron resonance mass spectrometry

The Fourier transform ion cyclotron resonance mass spectrometer (FTICR) can be considered to be a special type of ion-trap mass analyzer.29 However, the distinctive property of an FTICR is that the ion-trapping cell is surrounded by a magnetic field within which entrapped ions can resonate at their cyclotron frequencies. Additionally, an electrical field may be applied to excite the ions into a larger radius that can be measured as they travel past the detector plates situated on opposite sides of the trap. The detector measures the cyclotron frequencies of all the entrapped ions and calculates mass-to-charge values of all the ions using a fast Fourier transform operation. The magnetic field strength (eg 7, 9.4 or 12 T) correlates with the performance properties of the FTICR instruments. FTICR instruments are very powerful and provide exquisite sensitivity, high mass accuracy and high resolution; which are especially desirable properties in the analysis of complex protein mixtures. The FTICR instruments are especially compatible with ESI,30 but may also be used with MALDI as an ionization source.31 Despite these advantages, FTICR instruments require considerable expertise for successful operation, thus limiting somewhat their widespread utilization for routine proteomics studies.

Tandem Mass Analysis

Tandem analysis refers to the capacity of the analyzer to isolate an ion, subject it to further fragmentation and to perform analysis of the fragment ions. Fragmentation of the parent ion contains structural information that can be used to determine the amino-acid sequence of peptides, and to characterize post-translational modifications of proteins.

In tandem mass spectrometers, peptide ions analyzed in an initial mass analyzer are directed into a collision cell, where individual peptide ions can be isolated and subjected to further fragmentation upon collision with an inert gas (eg helium) in a process known as collision-induced dissociation (CID). Fragment ions exit the collisional cell and are separated in a second mass analyzer before scanning out to the detector. The predictable nature of peptide fragmentation in CID makes it possible to predict an expected pattern of fragmentation for any peptide.

Protein Identification Strategies

Peptide mass fingerprinting

Peptide mass fingerprinting (PMF) is the simplest method for protein identification. Peptide mass fingerprinting combines enzymatic digestion, mass spectrometry and computer algorithm-aided data analyses. Sequence-specific enzymes (eg trypsin) or chemicals (eg cyanogen bromide) with specific proteolytic cleavage activity are used to generate a series of peptides from a protein of interest. The peptides generated are analyzed by MS and the masses obtained are compared with theoretical mass spectra of the proteins in a sequence database.

The experimentally determined peptide masses are matched with the theoretical peptide masses of proteins within a database. The proteins with the highest number of experimental and theoretical peptide mass matches are ranked highest in probability of identification. The statistical significance of each of the matches is calculated and the matches are ranked in descending order of probability, with the protein showing the highest number of matches to the experimental data being considered the most likely match. Software algorithms that facilitate peptide mass mapping include PeptIdent/MultiIdent and ProFound.32, 33 Peptide mass fingerprinting algorithms may assign higher scores to larger proteins which contain more peptides, thereby leading to an increased propensity for misassignment of peptide matches which could result in erroneous identifications.

Peptide mass fingerprinting is most suitable for the identification of proteins from species in which complete genome sequences are available. Identifications by PMF are most reliable when high mass accuracy mass spectrometers are utilized. In addition, PMF is optimally utilized for the identification of proteins that have been previously resolved by 2-D GE, such that additional information including the molecular weight and isoelectric points can be used to supplement PMF identification. Thus, PMF is not well-suited for searching expressed sequence tag (EST) databases which contain incomplete gene coding information for any particular EST. Further, PMF is also not ideal for the analysis of complex protein mixtures, since it would be difficult to determine which proteins the peptides originated from.

Peptide sequencing by tandem mass spectrometry

Peptide sequencing by tandem mass spectrometry is based upon the random cleavage of the peptide bonds between adjacent amino-acid residues in a peptide sequence achieved by collisional-induced dissociation (CID). CID of the peptides yields ion series that are important for the identification of the amino-acid residues in a peptide. While several of the bonds along the peptide molecule are susceptible to cleavage, the most critical cleavages relevant to peptide sequencing by tandem mass spectrometry are those occurring along the peptide backbone. The most frequently observed cleavage occurs at the bond between the carbonyl oxygen and the amide nitrogen, fragmented to form a ‘b-ion’ and a ‘y-ion’ (Figure 6). In the ‘b-ion’, the positive charge is retained on the amino-terminal, and in the y ion the positive charge is retained on the carboxy-terminus of the parent peptide ion. Ions containing positive charge at both the N- and C-terminal ends are doubly charged. Fragmentation of doubly charged peptide ions results in ‘b- and y-ions’ that are complementary in peptide sequence information. Fortunately, these are also the most informative ions for sequence analysis.

Figure 6
figure 6

Peptide ion fragmentation by collision-induced dissociation (CID). Fragmentation of peptide ions occurs predominantly along the peptide backbone. In the most commonly observed cleavage, the bond between the carbonyl oxygen and the amide nitrogen is cleaved to form a ‘y-ion’ and a ‘b-ion’. The peptide fragments retaining the positive charge on the N-terminal are designated as a, b, c and the fragments retaining the positive charge on the carboxy-terminal end are designated x, y, z.

The extent to which the predicted fragment ions for each of the amino-acid sequences match the peptide masses generated from the experimental tandem mass spectra is used to calculate a cross-correlation (XCorr) score 15 such that higher the cross-correlation score, the higher the quality of the match. The difference between a normalized cross-correlation score and the next best match is reported as the (ΔCn) and indicates the quality of the top match in comparison to the next ranked sequences in the database.15

Recent developments in software algorithms facilitating improved identification include scoring routines that normalize XCorr values to be independent of peptide and database size (SEQUEST-NORM)34 and PeptideProphet, INTERACT and ProteinProphet,35, 36, 37, 38 which utilize statistical modeling algorithms that permit assignment of confidence values for each identification. In particular, a very useful aspect of the latter routine is that it allows the determination of false-positive rates of identification in specific datasets.

It should be noted that because of sequence conservation among similar or related proteins, a single tandem mass spectrum should not be construed as providing a unique identification of a particular protein. Rather, multiple peptide ‘hits’ corresponding to a specific protein sequence provide unequivocal evidence of identification of that protein.

As compared to PMF, MS/MS is advantageous in that it can provide unambiguous identification of proteins in complex mixture; indeed, matching of multiple MS/MS spectra to peptide sequences within the same protein increases the confidence of protein identification. MS/MS data can be used to search EST databases with reliable matches. A flow chart comparison of PMF and tandem MS for protein identification is summarized in Figure 7.

Figure 7
figure 7

Comparison of peptide mass fingerprinting and tandem mass spectrometry for protein identification. In PMF, the measured masses of intact peptide ions are compared with the calculated masses of peptides derived from in silico proteolysis of proteins from a database. PMF is useful for analyzing relatively pure samples containing few proteins. In tandem MS/MS, CID-induced fragmentation yields peptide ions with m/z values corresponding to specific amino-acid residues in the parent peptide. CID spectra are matched to peptide sequences with an intact peptide ion mass similar to that of the observed precursor ion. The calculated peptide fragment ion spectrum is compared to the observed CID spectrum. Unambiguous identification of a protein can be made with fewer matching peptides than required by PMF. Thus, tandem MS/MS is highly suitable for confident protein identification of samples composed of complex protein mixtures.

Applications of mass spectrometry-based proteomics

There are two major groups of applications of proteomics to the study of human disease. One is ‘expression proteomics’ which deals with the identification/quantification of proteins expressed, and the levels of proteins expressed in a given sample such as body fluids, normal or disease tissues. This protein expression ‘signature’ is conceptually similar to that obtained by genomic microarray analyses and would allow investigators to identify biomarkers or disease-specific proteins that may represent therapeutic targets. The second application is ‘functional proteomics’ which encompasses the study of protein in their functional environment. This includes the analysis of protein interactions with other proteins, interactions with DNA or RNA, and post-translational modifications such as phosphorylation and glycosylation. The latter approach allows investigators to obtain information regarding function of a protein, for example, identifying networks of signaling pathways that are characteristic of physiologic and pathologic states.

Protein Expression Profiling

The application of proteomics to the identification of disease markers from body fluids and tissue has received significant attention. The potential of obtaining mass spectral profiles of peptides and proteins without the need to carry out protein separation is highly suitable for biomarker discovery, especially as it would have reduced sample requirements and represent a high throughput approach. A popular method is SELDI-TOF.39, 40, 41, 42, 43, 44 The advantage of this method is that only small quantities of biological fluid or tissue material are needed. The mass spectral patterns reflect the protein and peptide content of the samples. In this regard, artificial-intelligence-based bioinformatics analyses of mass spectral patterns have been proposed to distinguish serum of normal patients with that of a number of neoplastic conditions including ovarian,42 breast,45 prostate46 and liver cancer.47 Such analyses require extensive pre- and postacquisition procedures including mass calibration, baseline correction and noise subtraction to facilitate identification of bona fide features that are robust and biologically relevant discriminators of the normal, benign and malignant states.48, 49 Proteomic patterns of nipple aspirate fluids,50 cytologic specimens51 and tissue biopsies52 using SELDI-TOF have also revealed a potential for utility in discovery of novel biomarkers that aid in diagnosis.

Our laboratory sought to identify proteomic changes that accompany histologic transformation in follicular lymphoma. To this end, we used SELDI-TOF mass spectrometry (ProteinChip™, Ciphergen Biosystems) to perform pairwise comparisons of low-grade and transformed follicular lymphomas in order to identify proteins that may be involved in the transformation process. In preliminary experiments, we demonstrated that the MS data obtained from SELDI-TOF MS were reproducible, and that reduction in sample complexity improved the ability to detect lower abundance proteins. We subjected crude cell lysates obtained from snap-frozen archived follicular lymphoma and patient-matched diffuse large B-cell lymphoma to analysis by SELDI-TOF mass spectrometry. Reproducible mass spectral profiles generated using a strong anionic exchange (SAX2)/immobilized metallic affinity-capture surface chemistry revealed upregulation of a protein with a molecular weight of 32 kDa (Figure 8a). Protein database searches revealed several candidates, among them cyclin D3 (32.5 kDa) whose differential expression were confirmed by immunohistochemical analysis on the primary tissue specimens (Figure 8b).52 These studies demonstrate the utility of SELDI-TOF-mass spectrometry for the rapid discovery of differentially expressed proteins using femtomolar quantities of crude protein derived from biopsy material. The versatility of this methodology supports its application to the rapid discovery of potential biomarkers in a variety of cellular systems.39, 40, 41, 42, 43, 44

Figure 8
figure 8

SELDI-TOF mass spectrometry for the identification of differentially expressed proteins in follicular lymphoma transformation. A matched pair of follicular lymphoma and its transformed counterpart from the same patient was assessed using strong anionic exchange surface chemistry (SAX2). A protein with a molecular weight of 32 kDa was seen to be upregulated (arrow) in the large B-cell lymphoma compared to the follicular lymphoma (a). The molecular weight, the pI and chip binding characteristics suggested the identity of the protein to be cyclin D3, which was validated by its differential expression as evaluated by tissue immunohistochemistry of a follicular lymphoma and large B-cell lymphoma (b).

Imaging Mass Spectrometry

Mass spectrometry has been used for the in situ analyses of proteins in tissue sections and those obtained by laser capture microdissection,53 thereby allowing imaging and comparison of protein expression between normal and disease tissues.54, 55, 56, 57, 58, 59, 60, 61 In this strategy, frozen tissue sections are applied to a MALDI plate and analyzed at regular spatial intervals. The mass spectral data obtained at different intervals are compared to yield a spatial distribution of masses (proteins) across the tissue section. Analyses using this approach have revealed up to 1600 protein peaks from histologically selected 1 mm diameter regions of single frozen sections.57 Using this approach, investigators have been able to distinguish glial neoplasms from benign brain tissues, and differentiate tumors of different histological grades.61

Differential Protein Profiling (Quantitative Proteomics)

Most quantitative proteomic studies are designed to determine the proteomic differences between one cellular state and another, and these are ‘relative’. In this regard, 2D-GE has been extensively utilized with great success.62 Two-dimensional gel electrophoresis (2D-GE) although a traditionally popular method for determining relative protein expression between two cellular states, is a relatively low throughput approach, which requires a large amount of starting material (in the order of 50 μg) with low sensitivity for detection of low abundance proteins such as cytokines and signaling molecules. Furthermore, proteins at both extremes of pI and molecular weight and those associated with the membrane fractions are not well represented by 2D-GE.11 Despite these limitations, due to highly automated and roboticized spot picking technologies, numerous reports have successfully used 2D-GE followed by MALDI-TOF analysis to determine differential expression of protein profiles in normal vs tumor tissues including breast, colon,59 lung, esophageal tissues63 and cellular responses to stimulating and differentiating agents such as LPS64 and response to Fas,65 cytotoxic agents such as butyrate.66 The average number of differentially expressed proteins identified by 2D-GE followed by MS analysis is in the range of 30–70 proteins.

Stable isotope labeling in cell culture (SILAC)

Stable isotope labeling with amino acids in cell culture (SILAC) is another useful global quantification strategy for the evaluation of differential expression of proteins from two distinct cellular populations.67, 68 In essence, the SILAC procedure entails culturing of cells from two different biologic conditions in parallel culture media that are deficient in a natural amino acid, but supplemented with monoisotopically labeled amino acid (eg 12C, 13C; 14N, 15N, respectively). The two cell populations metabolically incorporate the corresponding ‘light’ or ‘heavy’ isotopes in the synthesis of their respective cellular proteins. The proteins from each sample can thus be isolated, mixed at a 1:1 ratio and subjected to proteolytic digestion and mass spectrometric analysis. Corresponding peptides from each sample coelute during liquid chromatography and relative quantification of a particular peptide represented in both samples can be performed by measuring the ratios of the peptide mass peak intensities from matching isotopic peak pairs. The sequence of the peptide is subsequently obtained from tandem mass spectrometric analysis greatly facilitating identification proteins with differential expression in the two conditions. Of necessity, isolated cells must be capcable of protein synthesis in vitro.

Isotope-coded affinity tag method

The isotope-coded affinity tags (ICAT) strategy is a relatively new technology for relative protein quantification, relying on postharvested, stable isotope labeling. The ICAT method uses the ICATâ„¢ reagent to differentially label protein samples on their cysteine residues. The ICATâ„¢ method is advantageous in that it permits the evaluation of low-abundance proteins and proteins at both extremes of molecular weight and isoelectric point.69 The ICATâ„¢ reagent is composed of (i) a thiol-reactive group that reacts with the cysteine residues; (ii) a linker in which stable isotopes have been incorporated; and (iii) a biotin tag that enables affinity isolation and detection of peptides labeled with either the heavy or light versions of the ICATâ„¢ reagent. In this system, one sample is labeled with a tag containing a light isotope, and the other sample to which it is being compared is labeled with a tag heavy isotope. The two samples are combined, proteolytically digested and analyzed by mass spectrometry (Figure 9). Because ICAT-labeled peptides elute as pairs from a reverse-phase column, calculating the ratio of the areas under the curve for identical peptide peaks labeled with the light and heavy ICATâ„¢ reagent will allow determination of the relative abundance of that peptide in each sample. Advantages of the ICATâ„¢ strategy include internal quantitation, automation, and reduced complexity of the peptide mixture. The original ICATâ„¢ reagents featured either eight deuterium or hydrogen atoms at particular positions in the linker. The recently improved cleavable (c)ICATâ„¢ reagent contains nine 13C in the heavy version of the linker and nine 12C in the light version of the linker. The resultant database search is constrained by the requirement of cysteinyl-group and is compatible with analysis of low abundance proteins and can be performed from proteins obtained from snap-frozen archival tissues. See Table 1 for comparison of 2D-SDS-PAGE, SILAC and ICATâ„¢ methods for differential protein expression analysis.

Figure 9
figure 9

Outline of experimental protocol used for differential protein expression profiling by ICATâ„¢. Protein mixtures obtained from two cell populations are either labeled with light or heavy isotopic versions of the cleavable ICATâ„¢ reagent. Labeled proteins are combined, subject to multidimensional separation by SCX, RP and avidin affinity chromatography and analyzed by tandem mass spectrometry for peptide and protein identification. Based on the relative ratio of the two isotopically labeled peptides, a relative abundance of protein expression can be determined.

Table 1 Comparison of methods for quantitative protein expression profiling

The ICATâ„¢ methodology has been applied to the quantitative protein expression analysis of myc overexpression, which revealed novel effects of myc on cytoskeletal function70 as well as providing a global proteomic view of the intestinal epithelial cell response to enteropathogenic Escherichia coli.71 Quantitative analysis of proteome changes that occur during neoplastic transformation from normal hepatocytes to hepatocellular carcinoma has also revealed hundreds of differentially expressed proteins9 even from tissues obtained by laser capture microdissection.

We have utilized the ICAT™ method to determine the quantitative proteomic changes associated with inhibition of p38 mitogen-activated protein kinase (MAPK) in transformed follicular lymphomas.72 Our previous work showed evidence for p38 (MAPK) activation in a subset of transformed follicular lymphomas.73 The p38 MAPK is a key mediator of stress, extracellular-, growth factor-, and cytokine-induced signaling and has been implicated in the development of some cancers.74, 75 Using a selective inhibitor of p38 MAPK (SB203580), we demonstrated that p38 MAPK inhibition resulted in dose- and time-dependent caspase-3-mediated apoptosis in transformed follicular lymphoma-derived cell lines. In order to further elucidate the basis of the cellular effects of SB203580, we have employed a systems biologic approach involving quantitative proteomic analysis of transformed follicular lymphoma-derived cells treated with SB203580. Quantitative proteomic analysis using ICAT™-LC/MS/MS identified 277 differentially expressed proteins at 3 h and 350 proteins at 21 h of treatment with SB203580, the majority of which were also downregulated. Analysis of functional groups of the differentially expressed proteins implicated components of diverse overlapping pathways including the IL-6/PI-3 K, IGF-2/Ras/Raf, WNT8d/WNT5a pathways affecting diverse cellular processes such as cell proliferation, survival, invasion, angiogenesis and lymphocyte differentiation (see Figure 10). Our proteomic approach revealed the global cellular consequences of SB203580 treatment and provides insights into its growth inhibitory effect on transformed follicular lymphoma cells.

Figure 10
figure 10

Quantitative proteomic analysis of the response to the p38 MAPK inhibitor (SB203580) in transformed follicular lymphoma cells using ICATâ„¢ strategy. Proteins belonging to cell surface growth factor receptors, signaling molecules, and transcription factors were shown to be downregulated following exposure of transformed follicular lymphoma cells to the p38 MAPK inhibitor SB203580. The downregulation of these proteins would lead to alterations of multiple cellular effects including proliferation, survival, invasion, angiogenesis and lymphocyte development.

The ability to perform global quantitative proteomics has been significantly enhanced by the advent of the isotope-coded affinity tag-based technology (ICATâ„¢) which is efficient in simplifying the proteome, and in combination with LC-MS/MS permits detection and quantification of proteins and peptides from very complex samples.

Identification of Protein–Protein Interactions

Until recently, assessment of interacting proteins has only been feasible via laborious and time-consuming molecular biologic techniques such as the yeast-two-hybrid system. Using a variety of approaches including protein complex purification, immunoprecipitation, affinity chromatography76 followed by high-performance liquid chromatography (HPLC) and electrospray ionization and tandem mass spectrometry (ESI-MS/MS), interacting proteins of CD4 receptor complex, PCNA,77 nonmuscle myosin heavy chain II78 and protein kinase CÉ› signaling complex79 have been identified. The advantage of such copurification protocols is that the fully processed protein which serves as the bait can allow interactions in a native environment and cellular location to allow multicomponent complexes to be isolated. A limitation is the availability of an antibody which allows specific immunoprecipitation. Proteins can also be tagged with a specific motif such as 6-Histidine or GST, which allows purification using affinity characteristics to Nickel or GSH beads, respectively. See Figure 11 for immunoprecipitation and affinity tag-based approaches to identify interacting protein partners.

Figure 11
figure 11

Methods used to determine proteins within multi-protein complexes. Antibody specific for a protein can be used for isolation of immunocomplexes. An affinity tag such as 6-Histidine (His6), glutathione S-transferase (GST) or FLAG can be attached to a target protein of interest and used as a ‘bait’. Affinity chromatography is used to precipitate proteins associated with the affinity tagged proteins. Purified protein complexes are subsequently resolved by 1D SDS-PAGE, proteins excised, digested with enzyme and analyzed by mass spectrometry.

Using a functional proteomic approach, our laboratory has determined the identity of proteins that interact with the NPM-ALK oncogenic tyrosine kinase by immunoprecipitation with anti-ALK antibody followed by electrospray ionization (ESI) and tandem mass spectrometry (MS/MS). The NPM-ALK fusion protein results from the t(2;5)(p23;q35) chromosomal aberration, which is characteristic of a subtype of T-cell lymphoma known as anaplastic large cell lymphoma.80, 81, 82 Proteins that interact with ALK tyrosine kinase play important roles in mediating downstream cellular signals, and are potential targets for novel therapies. A total of 46 proteins were identified as unique to the ALK immunocomplex using monoclonal and polyclonal antibodies, while 11 proteins were identified in the NPM immunocomplex. Previously reported proteins in the ALK signal pathway were identified including PI3-K, Jak2, Jak3, Stat3, Grb2, IRS and PLCγ1. More importantly, many proteins previously not recognized to be associated with NPM-ALK, but with potential NPM-ALK interacting protein domains were identified. Proteins identified by MS were confirmed by Western blotting and reciprocal immunoprecipitation.83

Proteomic-based methods for the identification and quantitation of proteins associated with the chromatin84 or specific transcription factor complexes have been developed. Using the ICAT technique of quantitative mass spectrometry, proteins associated with a transcription factor NF-E2p18/Mafk, essential for β-globin expression in murine erythroleukemia cells were determined85 during two differentiation states.

Post-Translational Modifications

Proteins are modified to their mature functional form through a highly regulated and complex sequence of post-translational processing. Most modifications are reversible and play important roles in regulating the biologic function of the protein. There are over 200 reported modes of post-translational modifications of proteins such as phosphorylation, glycosylation, ubiquitination to mention a few. Of these, methods to determine the type and sites of protein phosphorylation have received significant attention. Reversible phosphorylation of proteins is a key event in signal transduction from extracellular stimulus via a transmembrane receptor to the nucleus.86 Phosphorylation occurs mainly on serine, threonine and tyrosine residues with the ratio being 1800:200:1 in vertebrates.87 Although the phosphorylation of tyrosine residues is much less frequent in the proteome, it has been most extensively studied.

More recently, attempts have been made to define the phosphorylation status of proteins at a global scale.88 Most approaches involve the use of phospho-specific antibodies to enrich for proteins with phosphorylated residues. Owing to the availability of excellent antibodies that react with phospho-tyrosines the analysis of tyrosine phosphoproteins far outnumber those for serine and threonine phosphoproteins. Using phospho-specific antibodies, serine/threonine-phosphorylated proteins were enriched by immunoprecipitation and identified by mass spectrometry. Functional studies led to the identification of a novel protein which was demonstrated to be a substrate of protein kinase A.89 Similarly, a novel phosphotyrosine protein was identified and characterized to be an ITAM-containing signal transduction protein induced by epidermal growth factor stimulation.90 Using 2-D gel electrophoresis and mass spectrometry Lim et al,91 identified over 50 distinct tyrosine phosphoproteins induced by epidermal growth factor in a human epidermoid carcinoma cell line which may represent novel targets for therapy. Similarly, phosphotyrosine proteomes have been characterized from thrombin-activated platelets,92 B-lymphoblasts93 and in response to heat shock.94 More large-scale studies of phosphoproteins have taken advantage of commercially available immobilized metal-ion affinity chromatography (IMAC), which allows enrichment of phosphopeptides95 and also allows identification of several phosphorylation sites on single proteins.96

Peng et al,10 have used affinity purification to enrich for ubiquitinated proteins of yeast and have identified over 1000 proteins that were ubiquitinated. Their ability to identify the site of ubiquitination of over 100 cases provides validation of their high throughput proteomic-based methodology.

Cellular Subproteomics

One of the major initiatives of the Human Proteome Organization (HUPO) is the comprehensive characterization of the complete subproteome of each cell type. Defining the global fingerprint of proteins expressed in a given cell type will aid in the identification of deregulated proteins that are characteristic of certain disease states and aid in the diagnostic and prognostication.

Secretome

Protein secretion by diseased cell types may provide a means for earlier detection of cancer. Systematic approaches to purify and identify secreted proteins from a variety of cell types have been reported. Martin et al,97 have used a combination of ICAT and tandem mass spectrometry to identify and quantitate a comprehensive list of over 500 proteins, which were secreted from neoplastic prostate cancer cell line (LNCaP) in the presence or absence of androgen receptor stimulation. Similar studies have identified numerous secreted proteins during differentiation of 3T3-L1 preadipocytes to adipocytes,98 in response to calcium-dependent secretion in human neutrophils,99 during osteoclast differentiation100 and astrocytes.101

Cell surface proteome

Proteomic approaches for the comprehensive profiling and identification of proteins on cell surface or membranes have been reported. Cell surface membranes may be subjected to biotinylation, affinity-captured and purified on avidin columns. Biotinylated intact proteins which were eluted can be separated by 2-D gel electrophoresis and protein spots from matching gels analyzed by mass spectrometry. Alternatively plasma-membrane fractions can be purified by sucrose-gradient centrifugation and analyzed by mass spectrometry. Using these approaches cell surface membrane proteins of human chronic lymphocytic leukemia,102 acute leukemia103 have been characterized. Analysis of the cell surface proteome of cancer cells of a variety of histologies has revealed an abundance of proteins with chaperone function such as GRP78, HSP70 and protein disulfide isomerase.104

Organellar proteomics

Systematic analysis of the proteins expressed by specific intracellular organelles such as the cancer-cell mitochondria,105 lymphocyte-derived exosomes,7 the lipofuscin in human retinal pigment epithelial cells,106 the phagosome,107 microsomes from cancer cells,108 lipid rafts109 and human nucleolus110 have provided extensive insights into the organellar proteomes that was previously impossible to obtain. These studies also demonstrate that even intracellular organelles as small as nucleoli are much more complicated in their protein expression profiles, and express hundreds of proteins some of which are not previously associated with their function. They also show that the protein composition of organelles is not static and can respond significantly to changes in the states of the cells.

Conclusions

The recent advances in protein separation techniques, mass spectrometry and completion of the genome sequences of several organisms are all critical developments that facilitate proteomic studies. At the present time, substantial progress has been made in the development of high-throughput technologies for analysis of proteins including protein and antibody arrays. The challenges for the future lie in the archiving and integration of the vast amounts of data derived from mass spectrometry experiments into cohesive information relevant to physiologic and disease processes. To this end, it will be necessary to establish universally accepted criteria for positive identification of proteins by mass spectrometry. Further, it will be of considerable benefit to establish guidelines and standards for the reporting of proteomic data with adequate provisions for open access to proteomic data sets by investigators worldwide. It will also be critical to integrate the various aspects of proteomics such as diagnostic proteomics, high throughput analyses of post-translational modifications, imaging mass spectrometry, protein–protein interaction mapping and quantitative protein expression profiling to formulate coherent hypotheses regarding the pathogenesis of disease entities. These studies will have to occur in concert with large-scale validation studies on clinical samples obtained from well-controlled patient populations before they can be considered for potential diagnostic, prognostic or therapeutic biomarkers (Figure 12). In the near future, proteomic studies will impact the discovery of novel biomarkers and targets for the practical diagnosis and treatment of various diseases including malignancies.

Figure 12
figure 12

Overview of goals and challenges of disease proteomics. A complementary approach in which a variety of high throughput methodologies are utilized will be critical in elucidating new insights into the pathogenesis of diseases. Ongoing discovery and validation phases of clinical samples are necessary steps required to identify proteins that may function as protein biomarkers with diagnostic, prognostic and therapeutic relevance.