Journal home
Advance online publication
Current issue
Archive
Press releases
Supplements
Focuses
Conferences
Guide to authors
Online submissionOnline submission
Permissions
For referees
Free online issue
Contact the journal
Subscribe
Advertising
work@npg
naturereprints
About this site
For librarians
 
NPG Resources
Bioentrepreneur
Nature Reviews Drug Discovery
Nature
Nature Medicine
Nature Genetics
Nature Reviews Genetics
Nature Methods
Nature Chemical Biology
news@nature.com
Clinical Pharmacology & Therapeutics
Nature Conferences
NPG Subject areas
Biotechnology
Cancer
Chemistry
Clinical Medicine
Dentistry
Development
Drug Discovery
Earth Sciences
Evolution & Ecology
Genetics
Immunology
Materials Science
Medical Research
Microbiology
Molecular Cell Biology
Neuroscience
Pharmacology
Physics
Browse all publications
Technologies
Nature Biotechnology  18, IT45 - IT46 (2000)
doi:10.1038/80085

Proteomics

By globally cataloging cellular protein content and state, proteomics promises to complement genomics in drug discovery and basic research.
If there is one criticism that can be lodged against genomics as a tool for drug discovery, it is that DNA sequence information provides only a static snapshot of all the possible ways a cell might use its genes. In actual fact, the life of a cell is a dynamic process in which it is constantly reacting to its environment. If, for example, a disease-inducing element is introduced, it may change how much gene product is made, when the genes are turned on, the type and extent of post-translational modifications that occur, and how these events affect other genes. These effects will determine if the organism successfully defends itself or succumbs to the disease.

Because the study of this dynamic has the potential to reveal new targets for drug intervention in disease processes, emphasis is now being placed on understanding how and when genome-encoded events (e.g., protein translation) occur and what relationship non-genome-encoded events (e.g., posttranslational modifications of proteins and interactions between proteins, nucleic acids, lipids, carbohydrates, and combinations thereof) have to particular physiological states. This endeavor is becoming known as proteomics because it focuses on the protein products of the genome and their interactions rather than on simple DNA sequence. It is being undertaken using powerful analytical tools, such as two-dimensional electrophoresis (2-DE) and ultrasensitive mass spectrometry (MS), coupled with high-throughput functional screening assays.

Historical perspective
From a technological standpoint, the essence of proteomics is protein characterization, which has been a mainstay of traditional biochemistry since the beginning of the century. In 1950, protein characterization acquired a very powerful tool, known as the Edman degradation method for protein sequencing, enabling laboratories around the world to systematically sequence proteins they could isolate in pure enough form1. Many proteins were sequenced by the Edman method, including hemoglobin, insulin, and myosin, and descendants of the method are still very much in use today. In addition to protein sequencing tools, the other central component of proteomics research is 2-DE, which is capable of resolving total protein extracts from cells into about 10,000 individual protein spots. Originally described 25 years ago2, 3, progress in the field has been dramatic as a result of coupling 2-DE with MS, and of the development of appropriate software for the analysis of resolved proteins and protein fragments in high-throughput modes.

The word proteome was first introduced in July 1995 and was defined as the "total protein complement of a genome."4 This original definition summarized what was known about protein expression in relationship to the genome by pointing out that as not all encoded proteins are expressed at any point in time, the pattern of protein expression changes depending on factors such as the stage of development of an organism and the organism's physiological state. It went on to hypothesize that the more complex the genome, the less of the total possible proteome will be expressed at any particular moment. Linking expression of the proteome to physiological changes associated with healthy or diseased conditions would then be a new way to identify clinically relevant molecular disease targets and developing novel drugs against them.

The use of proteomics to identify drug leads need not be based only on human protein expression patterns. Large-scale sequencing efforts of bacterial, viral, yeast, and other higher-organism genomes provide important information about the life cycle of the invading organism and where its weaknesses lie. For example, the 1995 sequencing of the entire genome of Hemophilus influenzae is likely to provide a foundation for proteomics in the study of bronchial infections5.

Since the sequencing of H. influenzae, 31 nonhuman genomes have been deciphered and 59 are underway. For example, the entire genomes of pathogens such as Staphylococcus aureus or of extremophiles such as Aquifex (a single-cell organism that lives at 90°C and may harbor novel enzymes for industrial applications) are being sequenced, the intention being to mine the genomic information using proteomic data that will help develop novel antibiotics, anticancer agents, industrial catalysts, and other types of desired molecules6. The success of these ongoing efforts is a key catalytic component of proteomics because it enables the seamless linkage of gene sequence and expressed phenotype in these organisms under varying physiological states.

Current state
The workhorse for obtaining protein expression patterns from cells and tissues is 2-DE. In high-format mode, this powerful technique produces gels containing up to 10,000 distinct protein and peptide spots7. The major problem with this technique is that over 95% of the spots cannot be sequenced because they are beyond the limits of current high-sensitivity Edman sequencers. By comparison, standard-format 2-D gels yield up to about 2,000 spots, which can all be sequenced by Edman methods4 or attomole-range MS8. These MS methods have been applied successfully toward the development of alternative methods to Edman sequencing—the so-called peptide mass fingerprinting approach. In this technique, proteins and peptides are digested either chemically or enzymatically to produce a unique degradation fingerprint that can then be analyzed by MS9.

Analysis of the data, whether generated by 2-DE or MS, is also being streamlined. At present, 2-DE gel patterns are scanned into a computer and then analyzed by computer algorithms that quantify the different gel patterns that arise when the proteome complement of a cell is obtained under normal or physiologically altered states10. Databases of 2-D gels obtained from cells or body fluids under varying conditions are now readily available over the Internet. Examples include the SwissProt Swiss-2DPAGE and Swiss-2DISEASE databases11, which include gel patterns obtained from renal cells in renal failure, myeloma cells, liver cells, and many others. Computer algorithms are also used to analyze the data from peptide mass fingerprints. One method relies on comparing the actual MS spectrum obtained from a test sample to a database of predicted spectra, and can accomplish this comparison in a high-throughput mode12.

Industry challenges
The biggest challenges faced by the proteomics industry are technical and validational. The technical challenges revolve around the ability to resolve reproducibly and accurately the 10,000 proteins and peptides obtained from whole cell extracts by large-format 2-DE. Various electrophoresis methods are constantly being refined to accomplish this goal, including isoelectric focusing followed by mass-based separation (ISO-DALT), nonequilibrium based electrophoresis (NEPHGE), and immobilized first-dimension pH gradients (IPG-DALT)13.

Resolution of very complex protein mixtures is crucial to the success of proteomics, and this must be coupled to the accurate sequencing of the proteins and peptides in the mixtures. This is the domain of high-throughput MS and interpretation software. Methods such as so-called tandem MS, electrospray ionization MS, and matrix-assisted laser desorption/ionization MS are increasing the sensitivity and versatility of MS as the method of choice for identifying proteins and peptides in mixtures. This is also helped by parallel developments in MS interpretation software, such as SEQUEST, which enables the high-throughput analysis of MS spectra of unknown samples against known and predicted standards114.

Because proteomics is considered a powerful platform for novel drug development, a number of companies are working in this area to meet these challenges (see Table 1). The January 1998 strategic alliance between Incyte (Palo Alto, CA) and Oxford GlycoSciences demonstrates how established biotechnology companies can leverage their unique capabilities to generate a system capable of handling the complexity of interpreting these data. In this case, Incyte's genomics databases were linked to Oxford GlycoScience's proteomics databases.

Table 1. Selected companies with programs in proteomics
Table 1 thumbnail

Full TableFull Table
The future
According to databases based on expressed sequence tags, the human genome consists of about 60,000−100,000 genes, scattered among 3 billion nucleotides of chromosome-based DNA code, the sequencing of which has been essentially completed. This represents an enormous amount of static information that needs to be correlated with dynamic information about gene products and their interactions. Proteomics will provide methods to correlate the vast amount of genomics information that is becoming available with the equally vast protein information that is being produced through analysis of cells under normal versus altered states.

The key here is high throughput, and perhaps the most promising advances in proteomics today are being carried out not just at the MS level, but also at the algorithm level. It is now possible to automate the acquisition of hundreds to thousands of mass spectra from peptides resolved by 2-DE and additional capillary electrophoresis methods. These spectra can be analyzed automatically for the presence of peptide fragments that are then used to reconstruct parent proteins, in a method akin to that used for the large-scale sequencing of genomes13. The future of proteomics should see continuous improvement in this methodology and its seamless linkage to genomics information if rapid progress is to be made.

As proteomics matures, it is inevitable that it will form connections with other emerging fields, and one of the most closely related is that of structural genomics, which links gene sequences to specific proteins and structures15. For example, a recent structural genomics report describes how protein folds in model organisms compare to each other, with the worm being more closely related in terms of protein fold patterns to yeast than to Escherichia coli16. These approaches may also help identify open reading frames of unknown fold, thus enriching proteome databases of these model organisms, and are likely to be extended to the human genome as well.

The future of proteomics will also see the development of specialist disciplines within the larger field, one being subproteomics. This focuses on the proteomes of specific cellular locations, on the proteomes obtained by fractionation based on solubility, and in general on specific proteomes obtained by any method that simplifies the complex protein load of a cell17. This powerful approach promises to refine our understanding of protein expression and dynamics relative to the exact state of a cell, including its sub-cellular compartments.

Finally, proteomics will benefit in the future from continuous improvements in the various techniques used to obtain and analyze proteomes18. New methods that complement the traditional 2-D gels and that have very significant promise are the isotope-coded affinity tags approach19, two-dimensional liquid chromatography-tandem mass spectrometry20, and head column stacking capillary zone electrophoresis21.

Conclusions
Proteomics aims to supplement gene sequence data with information on what proteins are being made where, in what amounts, and under what conditions. It aims to show how protein cascades inside cells change as a result of specific diseases, thereby identifying novel potential drug targets. It then aims to validate particular drug leads against those targets by providing information on how those leads affect the proteome cascades (see Lead validation, pp. 47−49). Therefore, in addition to providing answers to fundamental questions about the molecular basis of a cell's state at any point in time, proteomics promises to accelerate novel drug discovery through automated analysis of clinically relevant molecular phenomena.

Reprinted from Nature Biotechnology 16, 393−394 (1998).

 Top
REFERENCES
  1. Edman, P. Acta Chem. Scand. 4, 282-283 (1950).
  2. Barret, T. & Gould, J. Biochem. Biophys. Acta 294, 165-170 (1973).
  3. O'Farrell, P.H. J. Biol. Chem. 250, 4007-4021 (1975). | PubMed  |
  4. Wasinger V.C. et al. Electrophoresis 16, 1090-1094 (1995). | PubMed  | ISI | ChemPort |
  5. Fleischmann, R.D. et al. Science 269, 496-512 (1995). | PubMed  | ISI | ChemPort |
  6. James, P. Biochem. Biophys. Res. Comm. 231, 1-6 (1997). | Article | PubMed  | ISI | ChemPort |
  7. Klose, J. & Kobalz, U. Electrophoresis 16, 1034-1059 (1995). | PubMed  | ISI | ChemPort |
  8. Vorm, O. & Mann, M. J. Am. Mass Spec. 5, 955-958 (1994). | Article | ISI | ChemPort |
  9. Henzel W.J. et al. Proc. Natl. Acad. Sci. USA 90, 5011-5015 (1993). | PubMed  | ChemPort |
  10. Taylor, J. et al. Clin. Chem. 28, 861-866 (1982). | PubMed  | ISI | ChemPort |
  11. SwissProt databases (http://expasy.hcuge.ch).
  12. Yates, J.R. et al. Anal. Biochem. 214, 397-408 (1993). | Article | PubMed  | ISI | ChemPort |
  13. Humphery-Smith I. et al. Electrophoresis 18, 1217-1242 (1997). | PubMed  | ChemPort |
  14. Yates, J.R. J. Mass Spec. 33, 1-19 (1998). | Article | ISI | ChemPort |
  15. Moult, J. & Melamud, E. Curr. Opin. Struct. Biol. 10, 384-389 (2000). | Article | PubMed  | ChemPort |
  16. Gerstein, M. et al. Pac. Symp. Biocomput. 12, 30-41 (2000).
  17. Cordwell, S.J. et al. Electrophoresis 21, 1094-1103 (2000). | Article | PubMed  | ISI | ChemPort |
  18. Haynes, P.A. & Yates, J.R. Yeast 17, 81-87 (2000). | Article | PubMed  | ISI | ChemPort |
  19. Gygi, S.P. et al. Nat. Biotechnol. 17, 994-999 (1999) | Article | PubMed  | ISI | ChemPort |
  20. Link, A.J. et al. Nat. Biotechnol. 17, 676-682 (1999). | Article | PubMed  | ISI | ChemPort |
  21. Locke, S. & Figeys, D. Anal. Chem. 72, 2684-2689 (2000). | Article | PubMed  | ISI | ChemPort |
 Top
FULL TEXT
Previous | Next
Table of contents
Download PDFDownload PDF
Send to a friendSend to a friend
Save this linkSave this link

naturejobs

Abstract
Figures & Tables
References
Export citation
Export references
natureproducts

Search buyers guide:

 
Nature Biotechnology
ISSN: 1087-0156
EISSN: 1546-1696
Journal home | Advance online publication | Current issue | Archive | Press releases | Supplements | Focuses | Conferences | For authors | Online submission | Permissions | For referees | Free online issue | About the journal | Contact the journal | Subscribe | Advertising | work@npg | naturereprints | About this site | For librarians
Nature Publishing Group, publisher of Nature, and other science journals and reference works©2000 Nature Publishing Group | Privacy policy