Main

Living systems have evolved over several billion years to carry out carefully controlled chemistry in an aqueous environment at temperatures almost exclusively between zero and 100 °C. Under these conditions and unaided, many of the chemical reactions that are essential to life would not occur at perceptible rates, and most would not result in specific and reproducible products. Enzymes, along with other proteins and some nucleic acids, are used by natural biological systems to achieve this control; these macromolecules are responsible for the synthesis, transport and degradation of virtually every chemical compound in the biological environment1.

However, the chemical compounds used by biological systems represent a staggeringly small fraction of the total possible number of small carbon-based compounds with molecular masses in the same range as those of living systems (that is, less than about 500 daltons). Some estimates of this number are in excess of 1060 (ref. 2). The simplest living organisms can function with just a few hundred different types of such molecule, and fewer than 100 account for nearly the entire molecular pool3,4. Moreover, it seems that the total number of different small molecules within our own bodies could be just a few thousand4. So, it is clear that, at least in terms of numbers of compounds, ‘biologically relevant chemical space’ is only a minute fraction of complete ‘chemical space’ (see Box 1 for a definition of the terms used in this Insight). It is remarkable that so many complex processes can be carried out with such a limited number of molecules, and that biological chemistry can be so rich and diverse despite the relatively limited range of reactions that seem to have been exploited during the evolution of living systems (see Box 2 for a discussion of why particular types of chemistry might have emerged as the basis of life).

Similarly, as revealed by the recent triumphs of a variety of international sequencing projects, the genomes of the simplest living systems encode the sequences of less than 1,000 different proteins and the human genome about 100 times more5 — numbers that are minute when compared with the total number of proteins that could theoretically exist. As there are 20 different types of amino acid and the average size of a natural protein is about 300 residues, this number is a staggering 20300 or more than 10390, and if only a single molecule of each of these polypeptides were to be produced, their combined mass would vastly exceed that of the known universe. Natural proteins are therefore also a very select group of molecules.

The characteristics of this select group of natural proteins are linked to those of the small molecules that are used in living systems, and to those of the relatively small number of synthetic small molecules that we have developed into drugs. Understanding this link will help us answer the question of how we can best use the powerful new methods that are emerging to probe biological systems, both to understand the fundamental processes of life and to develop new strategies to treat disease.

Chemistry in a biological environment

A crucial factor in understanding the nature of living systems is that biological molecules do not act in isolation in the dilute solutions familiar to most chemists. Instead, they are packed together to an extraordinary degree within cells6,7. Indeed, the concentration of macromolecules inside cells can amount to several hundred grams per litre. Many of us may have been astonished during our school days to learn that our bodies are more than 70% water, but how many of us wondered at the difficulty of making a 30% solution of molecules that are rich in hydrocarbon derivatives and other hydrophobic groups? A space-filling representation of a typical cell (Fig. 1) illustrates how molecular species are crowded together in its complex organizational structure8,9. Such ‘molecular crowding’ is likely to be important in many facets of biological chemistry. For example, binding affinities and the rates of self-assembly can change by orders of magnitude as a result of this phenomenon. Crowding is therefore an important factor to consider when using data derived from in vitro studies in dilute solution to understand processes taking place in vivo6,7. Moreover, biological systems are increasingly being considered as highly interconnected sets of interactions (as shown, for example, by the emergence of ‘systems biology’) in contrast to the reductionist view of much of traditional biochemistry10. In addition, considerable efforts are being made to understand the astonishing ability of biological molecules to self-assemble and generate functional entities ranging from folded proteins to whole organisms11.

Figure 1: Schematic representation of a crowded cell.
figure 1

An array of different molecules can function independently under extremely crowded conditions, partly because of judicious distributions of oppositely charged polar groups on the molecular surfaces38. However, such systems are in some ways extremely fragile. For example, a mutation that alters just one amino acid in the haemoglobin molecule (replacing a charged carboxylic acid with a methyl group) can stimulate massive aggregation and give rise to a fatal genetic disease, sickle-cell anaemia8,39. More generally, many disorders of old age, most famously Alzheimer's disease, result from the increasingly facile conversion of normally soluble proteins into intractable deposits that occur particularly as we get older (see http://www.horizonsymposia.com/ for the Horizon Symposium ‘Protein Folding and Disease’, and ref. 40). Many of these aggregation processes involve the reversion of the unique biologically active forms of polypeptide chains into a generic and non-functional ‘chemical’ form41. Adapted with permission from D. Goodsell.

Techniques such as X-ray crystallography, nuclear magnetic resonance (NMR) and mass spectrometry have already revolutionized our understanding of the structure and function of biological molecules. It is now becoming possible to examine the ultrastructure of cells in remarkable detail, primarily through the development of modern imaging techniques12. Of particular importance are methods based on fluorescence emission. These can be used together with confocal microscopy to identify and track an increasingly wide range of molecules (both large and small) within their biological environments. Perhaps the most dramatic technique, however, is that based on electron microscopy: ‘cryoelectron tomography’ is now beginning to allow us to visualize, within a cell, molecular assemblies such as actin, which provides cells with their internal structures, and ribosomes, the complexes of proteins and nucleic acids that are responsible for all protein synthesis13. Along with these experimental approaches, computational procedures are being developed to simulate the behaviour of molecules within whole cells or indeed whole organisms14. Further developments of this type will undoubtedly lead to a deeper understanding of how cellular components of all types interact with each other. Even without such information, however, the high density of molecules in cells is a remarkable phenomenon that must be borne in mind when we attempt to perturb their behaviour for therapeutic purposes.

The challenges of drug discovery

Although some therapeutic agents are designed to increase the natural concentrations of key biological molecules that are depleted in particular disease states (for example, insulin), the primary objective of most pharmaceutical chemistry is to generate new compounds that can modulate disease processes. Most prized are relatively small molecules (only a small percentage of orally administered drugs have molecular masses above 500 daltons15) whose properties enable them to interact with and perturb the function of given biological molecules. It is equally important, however, that these compounds do not interact with most other molecules and generate potentially adverse side effects. The immensity of this task is illustrated by the schematic illustration in Fig. 1.

The natural products of different organisms — largely plants and bacteria — or their derivatives have been the staple tools of healers from the dawn of history until the birth of modern synthetic chemistry in the nineteenth century. Now, with the immense developments in combinatorial methods over the past decade or so, huge arrays of new molecules can be produced in relatively short periods of time16,17. Together with rapid screening methods, the drug-discovery process has been moving into uncharted territory; seemingly endless numbers of potentially active compounds are becoming available. As our knowledge of even the most complex aspects of biology at a molecular level expands, we can increasingly use rational arguments in the design of potential therapies and of new molecules that are promising to test or screen18. Despite such expert knowledge, the scale of the procedures needed to find appropriate compounds is remarkable; some individual drug companies screen millions of potential compounds each year against a range of targets, and even then, success is not guaranteed. As we have seen, however, such numbers are insignificant compared with the total number of possible small organic molecules. In addition, even the biggest libraries of compounds used in screening may not reflect the rich chemical diversity of the much smaller numbers of natural products19 (Fig. 2). It is clear, therefore, that reliable computational approaches to sift through much larger numbers of more varied compounds would be of tremendous value in drug discovery. Once likely candidates for a given purpose are identified, experimental screening procedures could then be focused on a much smaller range of selected compounds. As Shoichet discusses in a commentary in this issue (page 862), the examination of molecules in silico for their ability to bind to specific targets already plays an important part in screening strategies, although such ‘virtual screening’ approaches have yet to achieve their full potential in the drug-discovery process.

Figure 2: Comparison of the properties of different classes of molecule.
figure 2

A large database that contained compounds from combinatorial chemistry (a), natural products (b) and drugs (c) was analysed on the basis of a variety of molecular properties19. To visualize the diversity of these compounds on the basis of these properties, a statistical approach known as principal component analysis was used. Plots of the first two principal components — which explain about 54% of the variance in the properties analysed — are shown. Combinatorial compounds cover a well-defined region in diversity space given by these principal components. Both drugs and natural products cover all this space, as well as a much larger additional region of space. It is of particular interest to note the similarity of the plots of natural products and successful drug molecules. Adapted with permission from ref. 19.

Despite the many advances in technology, the cost of generating new drugs is inexorably rising, leading to ever greater pressure on pharmaceutical companies to focus on developing therapies primarily for the common diseases of wealthy countries20,21. Those suffering from rare diseases, and indeed the vast number of people in poorer countries, particularly in the tropics, are all too often neglected in the continuing fight against infection and disease. But despite the evidence that the new techniques entering the pharmaceutical industry have not yet been a panacea for the drug-discovery process22, it is still early days. We have yet, for example, to reap the real benefits of the recent revolutions in genomics and proteomics, which promise to identify a much greater number of well-characterized molecular targets for therapeutic intervention23. Indeed, the number of new targets that have emerged in recent years within the pharmaceutical industry as a whole is remarkably small. For example, between 1994 and 2001, just 22 drugs that modulate new targets were approved24. So far, analyses have revealed that the total number of human proteins against which drugs have been targeted is less than 500 (ref. 25), a small percentage of the estimated total number of proteins in the human body. Although expert opinions differ as to the total number of possible ‘druggable’ targets, it is certainly larger than the number currently known25,26.

Chemical ‘tools’ for biological systems

One of the potential problems with the new types of organic compound that are now being explored as drugs is that they may be extremely potent when tested against isolated targets in the laboratory environment, but within the complex cellular milieu (Fig. 1), they might interact with cellular components other than the desired target. The small molecules found naturally in biological systems, often called ‘natural products’, have at least been through the evolutionary mill and are perhaps less likely to interact in a damaging manner with common components of living systems, such as membranes or DNA. Indeed, of all drugs licensed over the past 20 years, around 30% are natural products or natural-product derivatives. If we include compounds ‘inspired by’ natural products, the fraction rises to almost twice this number27 (see also the review in this issue by Clardy and Walsh, page 829). Interestingly, a comparison of the properties of drugs, natural products and combinatorial chemistry libraries shows that combinatorial compounds typically cover a significantly smaller area of chemical space than either drugs or natural products19 (Fig. 2). This suggests that by aiming to mimic some properties of natural compounds, new combinatorial compounds could be made that are substantially more diverse and that have greater biological relevance19 than those currently known.

Remarkably, however, it has been estimated that only 0.1% of all bacterial strains — the richest source of new biological molecules — has been cultured and analysed28. Thus, as Clardy and Walsh discuss in this issue (page 829), there is a vast harvest of new natural products, perhaps running to millions of new compounds, waiting to be gathered from previously unexplored strains of living organisms (mainly bacteria, plants and fungi). Moreover, there are now opportunities to manipulate nature's ‘production lines’, for example, by using mutagenesis and gene shuffling to induce microorganisms to create new biologically active molecules, and hence to generate large libraries of new ‘natural products’.

One of the most important aspects of the development of new techniques and technologies is that they can be used for two distinct but highly complementary purposes. The focus of most activity in academic environments is to use these new approaches to understand the fundamental basis of cellular and organismal biology. The primary objective of most industrial research, however, is to use such strategies to discover new drugs, or at least new lead compounds for drug discovery. These activities are not of course mutually exclusive, and indeed closer interactions between members of these two communities could bring substantial benefits to both parties.

The use of the vast libraries of new small molecules as ‘chemical tools’ to probe biological function and discover potential therapeutics is discussed in the reviews in this issue by Stockwell (page 846), and Lipinski and Hopkins (page 855). Using small molecules to probe biological systems is now often described as ‘chemical genetics’ or ‘chemical genomics’29. The enormous complexity of the biological milieu, again evident in Fig. 1, makes one of the ultimate goals of this approach — to discover a small molecule to modulate the function of every protein — an extremely challenging task, even in the light of the large arrays of chemical compounds that can be generated by combinatorial methods of ever-increasing sophistication. As well as the issues of diversity and specificity, cells may have evolved mechanisms to protect some of their most vital proteins from interference by small, extraneous molecules. Another major issue in chemical genetics concerns the quality of the data that are generated using various assay technologies; screening the same biological target with three different types of assay was recently found to give a set of hits that is consistent from assay to assay in only about 30% of cases30. Although such a low level of consistency may not be very important for drug discovery, where the main objective is often simply to identify a number of active compounds, it can be debilitating if the objective is to chart the network of interactions within a biological organism. The quality of the chemical libraries and the reliability of screening techniques are still limiting factors in our knowledge of biological systems and their molecular diversity.

In addition to using the products of synthetic organic chemistry as tools to probe biological systems, new molecular tools based on other cellular components, such as DNA and RNA, are increasingly being developed. As Breaker discusses in a review in this issue (page 838), various RNA technologies are currently generating a great deal of interest. That RNA molecules play an important part in biological chemistry is well established, notably as the catalytic ribozymes that are involved in many important biological reactions, not least protein synthesis31. Moreover, RNA interference (RNAi), in which synthetic RNA fragments are designed to interfere with the normal expression of specific genes, is becoming an important tool for exploring gene function, as discussed at a recent Horizon Symposium, ‘Understanding the RNAissance’ (http://www.horizonsymposia.com), and reported in ref. 32. In addition, aptamers — RNA molecules that form binding pockets for ligands with specificities and affinities similar to those of antibodies — are emerging as new probes of the functions of both large and small molecules. Aptamers that bind to particular targets can be engineered using in vitro evolution and amplification techniques. They can then be used as reagents to probe the roles of specific molecules in a given biological system. Furthermore, members of a previously neglected class of molecules, the oligosaccharides, are emerging as biological tools, now that efficient methods for sequencing and synthesizing these complex molecules are being developed33. In addition to acting as probes of biological function and regulation, all these types of molecule are themselves becoming the focus of drug discovery efforts.

Future prospects

A rich array of data on the effects of small molecules on biological systems is accumulating, mainly from large-scale screening exercises (although the quality of this information is often less than optimal; see the review in this issue by Lipinski and Hopkins, page 855). Analysis of such databases, using the types of computational method pioneered by the flourishing bioinformatics community34, should lead to major advances, both in our understanding of biological chemistry and in our ability to identify promising therapeutic compounds and therapeutic targets35. Although progress is now being made in developing tools for mining chemical information, such progress is often limited by the difficulty in accessing much of the data of interest36. Some estimates suggest that only about 1% of some types of chemical information are in the public domain. In contrast, the majority of many forms of biological data, from gene sequences to protein structures, is freely accessible to scientists in both academia and industry. One of the reasons for the inaccessibility of so much chemical information, in addition to the technical challenges of cataloguing and checking vast amounts of data, is concerned with issues of intellectual property. However, one can be optimistic that ways will be found to overcome the various hurdles to allow these resources to be used in the most effective ways possible.

With increasingly diverse, reliable and accessible databases of information about the effects of new chemical compounds on specific biochemical processes, we shall be able to understand much more about the nature of biologically relevant chemical space. In addition, we shall learn more about the types of compound that might make good drugs by analysing the behaviour of a much wider range of small molecules than the miserly number used by our bodies for so many purposes — from generating energy to building arsenals of macromolecules. In this regard, among the most exciting recent developments are efforts to generate public databases of chemical information37, and the establishment by the US Government of Molecular Libraries Screening Centers. The latter initiative is designed to give public-sector researchers access to an initial library of around 500,000 small molecules for use in probing a diverse range of biological systems. These compounds may lead to new research tools and could aid the development of new drugs or the discovery of new applications for existing ones (see NIH Molecular Libraries Initiative, http://nihroadmap.nih.gov).

To exploit fully the emerging chemical tools and new methodologies in molecular and structural biology (for example, http://www.nigms.nih.gov/psi/centers.html), and so make the quantum leap in the efficiency of drug discovery that these developments promise, chemists must increasingly develop strong interactions with scientists from different disciplines. With such interdisciplinary collaborations it will be possible to embrace some of the grand challenges that exist in our quest to understand and manipulate the chemistry of life for the benefit of mankind. One of the greatest challenges must be to discover and understand what fraction of the universe of chemical space is used by living systems, and how much more could in principle be used to influence these systems. Progress in this area of science will lead to more efficient strategies for drug discovery. And as such challenges are embraced, we shall very likely learn many of the secrets of how life began and evolved.