“This short history of research in one area, lymphocyte receptors, is yet another witness to the power of DNA technology, and to the ability of this approach not only to explain known biological phenomena, but also to contribute to the discovery of new biological systems.” Susumu Tonegawa, Nobel lecture, 8 December 1987.

The double helix is all about biological information: how it is encoded, stored, replicated and used when required. Immunology, too, is about information. What genetic processes control the vast array of synthetic potential within an immune system capable of reacting specifically to virtually any microbe or foreign molecule? The secret lies in unique DNA processing that occurs during the development of lymphocyte cells, which are responsible for the specific immune response to a foreign agent (antigen). B lymphocytes produce antibodies (on their surface as well as secreted) and T lymphocytes mount cellular attacks on pathogenic infiltrators.

As lymphocytes develop, an array of short genes are rearranged and assembled together at the DNA level to form genes whose products recognize distinct antigens. As the process is mostly random, each lymphocyte makes different choices and thus the result is a vast repertoire of lymphocytes reactive to different antigens. This process has implications for antibody formation, cell-mediated immunity and malignancies of the immune system.

One B cell produces one antibody

At the beginning of the last century, Paul Ehrlich1 recognized that the specificity of antibodies lay in the complementarity of their shapes to the antigen(s) on the microbe being recognized. He saw antibodies as cellular 'side chains', which budded out from the cell surface as what today we would term receptors. Karl Landsteiner2 then demonstrated the exquisite specificity of antibodies, showing that animals could make antibodies to almost anything, including small synthetic organic molecules that had never previously existed in nature. Moreover, tiny structural changes in the antigen could lead to the production of a different antibody. It beggared belief that there could be so many different side chains. When antibodies were shown to be proteins, it seemed natural to conclude that a specific antibody molecule was shaped in close proximity to an antigen molecule much as plastic or sheet metal is moulded against a template. This 'direct template' hypothesis3 held sway for several decades.

In 1955, Niels Jerne4 published his natural selection theory of antibody formation, which postulated the random synthesis of a million or more different sorts of antibodies. When an antigen enters the body, it unites with an antibody that just happens to fit it, the antigen–antibody complex is taken up by a cell and the antibody somehow acts as the template for the formation of more of itself. David Talmage5 and Macfarlane Burnet6 recognized that this theory would make more sense if the postulated natural antibodies were located on the surface of what we now call B lymphocyte cells. If each cell were endowed with only one sort of antibody specificity, then the antigen could select one lymphocyte out of a repertoire, cause its clonal division and stimulate antibody production and secretion (Fig. 1). In 1958, Joshua Lederberg7 and I provided the first evidence for the clonal selection theory, namely that one B cell always produces only one antibody.

Figure 1: The clonal selection theory of antibody formation.
figure 1

Each B lymphocyte produces only one type of antibody receptor on its surface. An antigen recognizes one B lymphocyte out of a large repertoire. This triggers the rapid division and differentiation of the B cell to become a 'plasma cell', producing and secreting antibodies specific to the original antigen.

DNA shuffling in antibody formation

Antibodies are multichain proteins that come in different forms. The most abundant, immunoglobulin-γ (IgG), consists of two identical light (L) chains and two identical heavy (H) chains8 (Fig. 2a). The carboxy-terminal halves of the two light chains are identical to each other, but the amino-terminal halves differ in more than 50 residues, called the 'variable' region'9. The heavy chains, too, consist of a variable (V) part and a constant (C) part.

Figure 2: Antibody formation.
figure 2

a, Structure of an antibody (immunoglobulin). Two identical heavy chains are connected by disulphide linkages. The antigen-recognizing site is composed of the variable regions (yellow) of the heavy and light chains, whereas the effector site (which determines its function) is determined by the amino-acid sequence of the heavy chain constant region (red). b, Assembly of the light (κ)- and heavy (H)-chain genes of antibodies by somatic recombination during B-lymphocyte development. The L chain is encoded by variable (V), joining (J) and constant (C) genes. While the developing B cell is still maturing in the bone marrow, one of the 30–40 V genes combine with one of the five J genes and is juxtaposed to a C gene. The recombining process involves deletion of the intervening DNA between the selected genes. c, The H chain is encoded by V, D, J and C genes. The assembly of the H chain gene occurs in two stages: one of the D genes joins with a J gene, then one of the V genes joins with that DJ assembly.

In 1965, William J. Dreyer and J. Claude Bennett10 bucked the dogma at the time that 'one gene makes one protein', and put forward the revolutionary concept that the carboxy-terminal C region of the L chain was always encoded by a single gene, but that the amino-terminal V half could be encoded by multiple separate genes, perhaps as many as 100,000 in number. It followed that a chosen V gene must then somehow become associated with a C gene by a DNA rearrangement event in each lymphocyte cell, because only when the V–C regions were spliced together could a functional protein be expressed.

At that time, there was no way to interrogate the genome directly to test this concept, and for a decade debate and controversy raged. One school, led by Leroy Hood, favoured the idea that a large array of germline-encoded V genes for the L and H chains underwent rearrangement. At the other extreme were proponents of a single, very highly mutable V gene that was extensively mutated in emerging B cells. In the middle were those who favoured the idea of a handful of V genes that were subject to extensive recombination in somatic cells (the cells of the body, excluding the sex cells). This compromise was supported by luminaries such as Oliver Smithies and Gerald Edelman. Francis Crick was quite taken with the idea that just two V genes undergo rearrangement in the germ line, with further mutation in somatic cells.

Before arriving at the solution brought by advances in molecular biology, one more fact is worthy of note. Elvin Kabat astutely pointed out that in the V regions of both heavy and light chains there were also three short stretches of amino acids where variation was considerably greater than elsewhere in the molecules, and these so-called hypervariable regions were deemed likely to be sites of union with the antigen11. Could it be that there was actually an assembly of several, rather than just two, genes encoding each chain?

The new tools for manipulating and sequencing DNA came to the rescue. In 1976, Nobumichi Hozumi and Susumu Tonegawa12 conducted a landmark experiment. They used a DNA cutting enzyme known as a restriction endonuclease to digest the DNA extracted from a mouse embryo and from an antibody-secreting tumour. The resulting DNA fragments were then separated on the basis of size, and were reacted with radioactive probes, one corresponding to the whole L chain, the other to just the C portion. In the tumour, both probes lit up the identical fragment, whereas in the embryonic extract, two different fragments hybridized to the full-length probe, but only one of them to the C-region probe. Both fragments from the embryo sample were different in size from the single hybridizing fragment in the tumour sample. The experiment strongly argued for the V and C genes being some distance from each other in the embryo, but having been rearranged and assembled during development of antibody-forming cells in adults to form a continuous DNA sequence constituting the full L-chain gene.

Generating more diversity

Definitive elucidation of immunoglobulin gene structure depended on molecular cloning and subsequent sequencing of the genes themselves13. Here came another surprise, and one that could really not have been anticipated. V-region genes in the germline were found to be significantly shorter than is required to code for the V region of the L chain. It turned out that there is a series of 'minigenes' known as 'joining' (J) genes, which code for about 13 amino acids of the L chain. Thus, the full L chain is actually encoded by V, J and C genes (Fig. 2b). For the H chain, it is still more complicated, as there exists a series of 'diversity' (D) genes that encode up to eight amino acids that lie between the V and J regions. Thus, the H chain is encoded by V, D, J and C genes (Fig. 2c). The assembly of a complete H-chain V region occurs in two separate steps: first, one of the D regions joins with one of the J regions, then one of many V regions joins with that DJ assembly (Fig.2c). The joining process is followed by deletion of the intervening DNA between the chosen minigenes. This is the first example of a somatic cell possessing a different genome from its fellow cells.

This minigene assembly process has important implications for antibody diversity. In humans, here are two types of L chains, κ and λ, each with its own sets of V and J genes. For the κ light chain, there are 40 functional V genes and 5 functional J genes; for the λ chain, there are 31 and 4, respectively. There is only one kind of variable region for the H chain, encoded by 51 V genes, 25 D genes and 6 J genes. To a first approximation, therefore, there are (40 × 5) + (31 × 4) + 324 different possible assemblies of L chains, and 51 × 25 × 6 = 7,650 combinations for H chains. Thus, together, there are potentially 2,478,600 different types of germline-encoded antibodies.

But this is a considerable underestimate for two reasons. Recombination junctions can occur at different positions and this junctional diversity increases variability. Furthermore, a few extra nucleotides, called N regions, can be inserted between D–J junctions and V–D junctions in many H chains, and in a smaller percentage of L-chain V–J junctions. These nucleotides are not present in the germline and add to antibody diversity.

Yet further diversity can be generated by DNA mutations in dividing B cells. B cells expressing newly assembled immunoglobulin genes, each with its own unique specificity, constitute the 'primary repertoire'. When an antigen stimulates a chosen B cell to divide (Fig. 1), a proportion of the progeny migrate into the vicinity of antigen-capturing follicular dendritic cells (FDCs) and gradually form a 'germinal centre'. FDCs retain antigen on their surface for long periods and stimulate further rounds of division. Within the germinal centre the B cells display an extraordinarily high rate of somatic mutation in V genes, estimated at 10–3 per nucleotide per division14. As antibody production accumulates, only those B cells with heightened affinity for the antigen gain access to FDC-bound antigen and thus are further stimulated to divide. As a result, B-cell clones secreting higher affinity antibody are selected in an iterative manner.

The 'memory' B cells that emerge from the germinal centre constitute the 'secondary repertoire', which is even more diverse than the primary one. Twenty mutations per chain are not uncommon; nor are thousand-fold increases in affinity. Thus, as it turned out, two early theories of antibody diversification proved to be correct: rearrangement of germline genes gives the naive B-cell repertoire, and somatic mutation ensures further diversification during memory B-cell development.

Switching function

There are several different classes of antibodies, all of which have distinct roles that are also produced by rearrangements at the DNA level. There are eight different genes for the C region of the H chain, which specify different antibody functions. Each B cell first links the chosen VDJ assembly to a C gene known as μ, creating an antibody class called IgM. If that cell is propelled into a pathway favouring the production of an antibody prominent in mucus secretions (such as in the gastrointestinal tract), the VDJ section is switched over to a C region encoded by C gene α, and the cell produces IgA. If, on the other hand, the antigen is of parasite origin, or an allergen such as a pollen grain, the cell may be stimulated to produce IgE, in which case the VDJ region associates with the product of the C gene ɛ. All of this occurs without any change in the specificity of the antibody being secreted. Although the detailed molecular mechanisms are still being investigated, the class switching again involves sequential excision of portions of the genetic material. Cytidine deaminase induced by B-cell-specific activation may be significant in both class switching and somatic hypermutation.

Assembly of T-cell receptors

Whereas B cells make antibodies against antigens, the thymus-derived or T cells also respond to foreign agents, specializing in a more localized form of combat. Cytotoxic T cells are capable of killing virus-infected cells or cells displaying cancer-specific antigens. Other T cells secrete powerful stimulatory and inflammatory molecules, most of which act in a strictly localized context. T cells can also help guide B cells down appropriate pathways of differentiation.

In common with B cells, T cells also have one-receptor specificity and, for simplicity, I shall mention only the αβ T-cell receptor (TCR), a heterodimer consisting of two subunits, the α- and β-chains, joined by disulphide bonds15,16. The strategy for generating T cells with different receptors is strikingly similar to that used by B cells to produce different antibodies and, in fact, the TCR binding surface looks much like that of an antibody. The β-chain of the TCR is assembled in somatic cells from V, D, J and C genes; the α-chain from V, J and C genes. There are additions of N-region nucleotides between V and D, as well as between D and J on the β-chain; and between V and J on the α-chain. A similar rearrangement also takes place for the γδ TCR.

But something peculiar about T-cell recognition was noted by Rolf Zinkernagel and Peter Doherty17, who demonstrated that cytotoxic T cells could recognize viral antigens only if a specific 'self' molecule were also present on the target cell (see Box 1). The key part of the T-cell recognition puzzle fell into place when it was discovered that the TCR recognized short antigenic peptides bound to the groove of a self molecule known as the major histocompatibility complex (MHC), as well as surrounding portions of the MHC molecule itself. Cells have special mechanisms for fragmenting proteins into peptides of 8–24 amino acids in length, attaching these to MHC molecules and transporting the entire complex to the cell surface. TCRs then 'see' these short linear portions of antigens, be these of viral, bacterial or parasitic origin, or even portions of normal intracellular components. Such a system can help to control infections where the pathogen goes 'underground' inside a cell, and can also eliminate cells with mutated self antigens, such as cancer cells.

Lymphocytes and cancer

Lymphocytes have been a favourite tool in cancer research. A notable example of DNA science applied in this way relates to the B-cell tumour of humans known as Burkitt's lymphoma. Occasionally DNA strands break and are incorrectly repaired. Thus, a piece of a chromosome becomes attached to the broken end of another one, and vice versa, in a process known as reciprocal translocation (Fig. 3). In the case of Burkitt's lymphoma, a tumour-promoting gene or oncogene called myc is translocated from its normal position on chromosome 8 right into the middle of the IgH chain locus on chromosome 14 (ref. 18). In this highly active transcriptional environment, myc expression is switched on, and eventually cancer develops.

Figure 3: Reciprocal chromosomal translocations in Burkitt's lymphoma, a solid tumour of B lymphocytes.
figure 3

The genes for making the heavy chains of antibodies (Ch) are located on chromosomes 14, whereas those for making the light chains are on chromosomes 2 and 22. These genes are expressed exclusively in B lymphocytes, because only these cells have the necessary transcription factors to switch on their expression. In most (over 90%) of Burkitt's lymphoma cases, a reciprocal translocation moves the proto-oncogene c-myc from its normal position on chromosome 8 to a location close to the antibody heavy-chain genes on chromosome 14 (ref. 18). In other cases, c-myc is translocated close to the antibody genes on chromosome 2 or 22. In every case, c-myc now finds itself in a region of active gene transcription, and it may simply be the overproduction of the c-myc product (a transcription factor essential for cell division) that propels the lymphocyte down the pathway towards cancer.

It has been possible to create lymphoma-prone transgenic mice, which express myc in aberrantly high amounts. Because cancer is typically a multistage process, if further oncogenes are expressed simultaneously in transgenic mice, the onset of cancer can be dramatically accelerated. One such example involves the gene bcl-2. When this gene is expressed, it stops cells from undergoing natural programmed death (apoptosis)19. Mice expressing myc and bcl-2 showed very rapid development of tumours. An enormous amount of literature has accumulated related to the expanding family of bcl-2-related genes and their roles in the regulation of programmed cell death. Models derived from lymphocytes and their malignancies have led to insights with implications well beyond immunology.

DNA vaccines

DNA research has been of immense value to vaccine research. Through gene cloning and expression, candidate antigens can be identified and tested. In an era of rapid nucleotide sequencing, the whole genome of a pathogen can be determined, and computer programs can search for sequences likely to encode outer membrane proteins, which can be assessed as candidate vaccine molecules (see, for example, the series of papers published recently in Nature (419, 489–542, 2002) on the genomics of the malaria parasite).

Amazingly, DNA itself can serve as a vaccine. DNA vaccines work on the principle that the gene sequence for one or more candidate antigens is introduced into an animal or person via a delivery vehicle known as a vector, together with a strong promoter that can switch on its expression in mammalian cells. Cells that take up the injected DNA transcribe and translate the gene and release the relevant antigen protein, which the body can in turn manufacture antibodies against. Thus, the body itself becomes a vaccine factory20. Unfortunately, so far this approach has worked better in mice than in humans, but many avenues are being pursued to improve this situation.

To strengthen the immune response to a vaccine, it may be necessary to use an adjuvant substance. Here, DNA may also be of potential use. Scavenger cells, which capture antigens, have evolutionarily conserved receptors, known as Toll-like receptors (TLRs), which recognize antigens common to many pathogens. One such receptor is TLR-9, which recognizes unmethylated CpG motifs commonly found in bacterial but not mammalian DNA. Accordingly, unmethylated CpG-rich DNA sequences represent a promising new category of adjuvant21.

Future directions

The solutions to the puzzle of antibody diversity and mystery of T-cell recognition of antigenic peptides are among the brightest chapters of biology in the last quarter of the twentieth century. The future of immunology will be all about how the system is regulated and how it makes decisions: whether to respond or not; whether to direct efforts towards antibody formation or cell-mediated immunity; and, if the latter, whether more towards cytokine-secreting T cells or cytotoxic T cells.

As in the past, the future will be about information and thus about DNA science. All the complex signalling pathways, the feedback loops, and the intricate rules governing cell division on the one hand or programmed cell death on the other, will be progressively revealed. As this happens, the possibilities for applied research and development will be immense. In particular, new therapeutic targets will be identified. The 'miracle' drug for chronic myelogenous leukaemia, Glivec, was made possible after the characterization of the extraordinary cancerous potential of the chimaeric oncogene bcr–abl. This will surely be only the first of a plethora of more intelligently designed anti-cancer drugs. Potent cytokines and monoclonal antibodies directed against cell surface-associated structures are already prominent within a radically revised pharmaceutical armamentarium in areas including cancer, autoimmunity, allergy and transplantation. DNA research is therefore crucial to a new generation of immunologists, from those striving towards the development of novel vaccines to those seeking to understand and control autoimmune diseases, allergy and transplant tolerance.