Norman J. Dovichi is at the Department of Chemistry, University of Washington, Seattle, WA 98195, USA. dovichi@chem.washington.edu
A simple hydrolysis-based approach to peptide analysis may facilitate high-throughput identification of post-translational modifications.
Gene sequences are transcribed from DNA to RNA, which is then translated into the corresponding proteins. The proteins themselves can then undergo a set of enzymatic reactions to generate post-translational modifications of specific amino acids. These modifications, such as phosphorylation and glycosylation, have a profound influence on the function of proteins. Although the determination of the primary amino acid sequence of a protein is routine, scanning an entire protein for post-translational modifications has been quite difficult until now. In this issue, Li and colleagues1 report a simple but powerful approach for detecting almost all possible post-translational modifications in a protein and apply the method to characterize a set of proteins from Escherichia coli.
Li's method has its genesis in the earliest protein analysis technologies. It was discovered in the late 1800s that acid treatment hydrolyzes a protein into its constituent amino acids. These amino acids could then be characterized through complex and sophisticated chemical methods, as codified by van Slyke2. Such analyses identified the composition of the amino acids in the protein, but not their sequence. Hundreds of grams of protein were required for these procedures, restricting their use to those few proteins that were available in large quantities.
Biology changed 50 years ago with the development of instrumental methods of analysis. In particular, partition chromatography provided a much faster and more sensitive method for the identification of amino acids. Martin and Synge, the developers of partition chromatography, applied it to analyze the acid hydrolysis products of gramicidin S, thereby determining the first sequence of a peptide; their work was recognized with the Nobel Prize in chemistry in 1952 (ref. 3). Frederick Sanger extended their method to determine the primary amino acid sequence of insulin, for which he received his first Nobel Prize in chemistry in 1958 (ref. 4). The acid hydrolysis method was quite tedious and was soon supplanted by Pehr Edman's phenylisothiocyanate method, which sequentially cleaves the N-terminal amino acid from a protein, followed by chromatographic identification5. Edman's method was used by two generations of scientists as the workhorse tool for protein sequencing.
Edman chemistry has fallen by the wayside over the past decade because of two developments. First, large-scale genomic sequencing efforts have generated databases with the sequence of all genes for an organism; the genetic code is then used to create a database of proteins for that organism. Second, matrix-assisted laser desorption ionization (MALDI) and electrospray mass spectroscopic methods were developed for the analysis of large biomolecules; Tanaka and Fenn shared the 2002 Nobel Prize in chemistry for this work6,
7. When combined with tandem mass spectrometry (MS/MS), these ionization methods allow protein identification by comparing partial amino acid sequence obtained from a small number of tryptic digest fragments with the protein database for that organism.
Although extremely powerful, mass analysis of tryptic peptides is not without limitations. Genomic databases are usually silent on alternative splicing, wherein exons can be shuffled to create different proteins from a single gene. Analysis of tryptic peptides usually does not allow identification of the particular splice form of the gene. More importantly, mass analysis of tryptic peptides very rarely covers the full length of the protein; it is not possible to identify post-translational modifications that occur on the lost peptides.
Martin and Synge predicted that "the characterization of the lower peptides resulting from the partial hydrolysis of proteins should be the most valuable method of determining the order of amino acids in proteins and protein structure"8. Li and co-workers have revisited this prediction and indeed have developed a powerful method for the determination of protein sequence and post-translational modification. Acid hydrolysis is the breakage of the peptide bond between two amino acids in a protein, and is usually carried out with a high concentration of acid at high temperature. Li reports that carrying out acid hydrolysis under microwave irradiation leads to the controlled hydrolysis of the protein, breaking a single peptide bond and thereby creating two peptides, one containing the N terminus and the other containing the C terminus. This observation is surprising because acid hydrolysis usually results in multiple breaks, yielding a complex mixture of peptides that is not useful for protein mass spectrometry.
Microwave-induced acid hydrolysis seems to occur randomly along the protein's length, creating a set of all possible N- and C-terminal peptides, even for proteins containing acid-labile asparagine-proline bonds. These peptides form a ladder, by analogy to the sequencing ladder created in Sanger's DNA sequencing reaction9. Mass analysis of this peptide ladder (MAP analysis) is used to characterize each amino acid in the protein. Protein ladders have been generated in the past, usually by enzymatic digestion. However, the generation of full-length ladders that span the entire sequence of a protein is an exciting development.
As an illustration of the approach consider the A chain of insulin, which consists of the 21-residue sequence GIVEQCCTSICSLYQLENYCN. Controlled acid hydrolysis produces a set of 20 peptides formed by a single-strand break, creating an N-terminal ladder and a C-terminal ladder. Each ladder generates its own mass spectrum and software is used to unwrap the mass data from the two ladders. Figure 1 presents an idealized spectrum for the N-terminal ladder generated from insulin. The amplitude of each peak will vary depending on its generation rate and its ionization efficiency.
Figure 1. Analysis of the N-terminal ladder of insulin by mass analysis of peptides (MAP).
A protein undergoes controlled acid digestion to create random breaks leading to the creation of two ladders, one consisting of peptides containing the N terminus of the protein and the other containing the C terminus. The figure shows a highly schematic example of mass analysis of the N-terminal ladder of the A chain of insulin. The sequence of the protein is determined by the mass differences between peaks, and post-translational modifications are identified by anomalous values for the differences. A real mass spectrum would also shows peaks corresponding to the C-terminal ladder, and software is used to distinguish the two. The peak heights in the spectrum depend on the rate of bond cleavage and the ionization efficiency of the peptide.
The difference in mass between adjacent peaks is given by the mass of the corresponding amino acid. For an unmodified protein, the protein sequence can be read directly from those mass differences. As an example, Li and colleagues show a set of ladders that span the length of lysozyme (129 amino acids), allowing determination of its sequence based on a single mass spectrum. This de novo sequencing will be important for proteins obtained from organisms whose genomes have not been sequenced. It will be even more important for sequencing of proteins produced from alternative splice forms of a gene and for detecting the loss of signaling sequences from the N terminus of a protein.
Proteins that contain post-translational modifications will generate mass differences that do not correspond to the mass of amino acids. Perhaps most importantly, phosphorylation can be identified by the 80-Dalton mass shift associated with the phosphate group. As an example, Li and colleagues determine six phosphorylation sites of -casein in a single mass spectrum. The authors also demonstrate a technique for the identification of acetylation and heme attachment sites. Of course, acid-labile modifications are lost during this procedure, but such modifications are rare. Li and colleagues present a tour de force application of this technology to scan a set of 28 proteins from an E. coli homogenate, where loss of the N-terminal methionine residue and other signal peptides, the oxidation of methionine, and formation of disulfide bonds were observed.
The technology has several obvious applications in high-throughput scanning for post-translational modifications across a proteome. One can imagine the use of multidimensional chromatography or capillary electrophoresis for the separation of proteins from a cellular homogenate and fraction collection in high-density microtiter plates. Microwave irradiation can then be used to generate peptide ladders, with subsequent spotting on high-density MALDI plates for MAP analysis.
The method has some limitations. Like all mass spectrometry−based protein analysis methods, ladder-based sequencing can not distinguish between leucine and isoleucine, which have identical mass, and a high-resolution instrument is required to distinguish between glutamine and lysine, which have similar mass. Li reports that current time-of-flight instruments provide low sensitivity for proteins of higher molecular mass and are not useful in the analysis of ladders prepared from proteins with more than 150 residues1. The use of Fourier transform ion cyclotron resonance instruments also promises to extend the mass range of this approach to larger proteins. Finally, the method does not provide details on glycosylation patterns, whose determination remains a formidable hurdle.