Genes were first shown to be made of DNA only nine years before the structure of DNA was discovered (ref. 1; and see article in this issue by McCarty, page 406). Although revolutionary, the idea that genetic information was protein-free ultimately proved too simple. DNA in organisms with nuclei is in fact coated with at least an equal mass of protein, forming a complex called chromatin, which controls gene activity and the inheritance of traits.

'Higher' organisms, such as yeast and humans, are eukaryotes; that is, they package their DNA inside cells in a separate compartment called the nucleus. In dividing cells, the chromatin complex of DNA and protein can be seen as individual compact chromosomes; in non-dividing cells, chromatin appears to be distributed throughout the nucleus and organized into 'condensed' regions (heterochromatin) and more open 'euchromatin' (see article in this issue by Ball, page 421). In contrast, prokaryotes, such as bacteria, lack nuclei.

The evolution of chromatin

The principal protein components of chromatin are proteins called histones (Fig. 1). Core histones are among the most highly conserved eukaryotic proteins known, suggesting that the fundamental structure of chromatin evolved in a common ancestor of eukaryotes. Moreover, histone equivalents and a simplified chromatin structure have also been found in single-cell organisms from the kingdom Archaeabacteria2,3.

Figure 1: Packaging DNA.
figure 1

a, The organization of DNA within the chromatin structure. The lowest level of organization is the nucleosome, in which two superhelical turns of DNA (a total of 165 base pairs) are wound around the outside of a histone octamer. Nucleosomes are connected to one another by short stretches of linker DNA. At the next level of organization the string of nucleosomes is folded into a fibre about 30 nm in diameter, and these fibres are then further folded into higher-order structures. At levels of structure beyond the nucleosome the details of folding are still uncertain. (Redrawn from ref. 41, with permission). b, The structure of the nucleosome core particle was uncovered by X-ray diffraction, to a resolution of 2.8Å (ref. 42). It shows the DNA double helix wound around the central histone octamer. Hydrogen bonds and electrostatic interactions with the histones hold the DNA in place.

Because there is more DNA in a eukaryote than in a prokaryote, it was naturally first assumed that the purpose of histones was to compress the DNA to fit within the nucleus. But subsequent research has dramatically revised the view that histones emerged as an afterthought, forced on eukaryotic DNA as a consequence of large genome size and the constraints of the nucleus.

It was known that different genes are active in different tissues, and the distinction of heterochromatin and euchromatin suggested that differences in chromatin structure were associated with differences in gene expression. This led to the early supposition that the histones were also repressor proteins designed to shut off unwanted expression. The available evidence, although rudimentary, does indeed suggest that archaeal histones are not merely packaging factors, but function to regulate gene expression2,3,4,5. They may facilitate gene activation, by promoting specific structural interactions between distal sequences, or repression, by occluding binding sites for transcriptional activators.

We suggest that the function of archaeal histones reflects their ancestral function, and therefore that chromatin evolved originally as an important mechanism for regulating gene expression. Its use in packaging DNA was an ancillary benefit that was recruited for the more complex nucleosome structure that subsequently evolved in the ancestors of modern eukaryotes, which had expanded genome sizes. Although their compactness might seem to suggest inertness, chromatin structures are in fact a centre for a range of biochemical activities that are vital to the control of gene expression, as well as DNA replication and repair.

Packaging DNA into chromatin

The fundamental subunit of chromatin is the nucleosome, which consists of approximately 165 base pairs (bp) of DNA wrapped in two superhelical turns around an octamer of core histones (two each of histones H2A, H2B, H3 and H4). This results in a five-to tenfold compaction of DNA6. The DNA wound around the surface of the histone octamer (Fig. 1) is partially accessible to regulatory proteins, but could become more available if the nucleosome could be moved out of the way, or if the DNA partly unwound from the octamer. The histone 'tails' (the amino-terminal ends of the histone protein chains) are also accessible, and enzymes can chemically modify these tails to promote nucleosome movement and unwinding, with profound local effects on the chromatin complex.

Each nucleosome is connected to its neighbours by a short segment of linker DNA (10–80 bp in length) and this polynucleosome string is folded into a compact fibre with a diameter of 30 nm, producing a net compaction of roughly 50-fold. The 30-nm fibre is stabilized by the binding of a fifth histone, H1, to each nucleosome and to its adjacent linker. There is still considerable debate about the finer points of nucleosome packing within the chromatin fibre, and even less is known about the way in which these fibres are further packed within the nucleus to form the highest-order structures.

Chromatin regulates gene expression

Regulatory signals entering the nucleus encounter chromatin, not DNA, and the rate-limiting biochemical response that leads to activation of gene expression in most cases involves alterations in chromatin structure. How are such alterations achieved?

The most compact form of chromatin is inaccessible and therefore provides a poor template for biochemical reactions such as transcription, in which the DNA duplex must serve as a template for RNA polymerase. Nucleosomes associated with active genes were shown to be more accessible to enzymes that attack DNA than those associated with inactive genes7, which is consistent with the idea that activation of gene expression should involve selective disruption of the folded structure.

Clues as to how chromatin is unpacked came from the discovery that components of chromatin are subject to a wide range of modifications that are correlated with gene activity. Such modifications probably occur at every level of organization, but most attention has focused on the nucleosome itself. There are three general ways in which chromatin structure can be altered. First, nucleosome remodelling can be induced by complexes designed specifically for the task8; this typically requires that energy be expended by hydrolysis of ATP. Second, covalent modification of histones can occur within the nucleosome9. Third, histone variants may replace one or more of the core histones10,11,12.

Some modifications affect nucleosome structure or lability directly, whereas others introduce chemical groups that are recognized by additional regulatory or structural proteins. Still others may be involved in disruption of higher-order structure. In some cases, the packaging of particular genes in chromatin is required for their expression13. Thus, chromatin can be involved in both activation and repression of gene expression.

Chromatin remodelling

Transcription factors regulate expression by binding to specific DNA control sequences in the neighbourhood of a gene. Although some DNA sequences are accessible either as an outward-facing segment on the nucleosome surface, or in linkers between nucleosomes, most are buried inside the nucleosome. Regulatory factors must therefore seek out their specific DNA-binding sites and gain access to them. They are aided by chromatin-remodelling complexes that continually shuffle the positions of individual nucleosomes so that sites are randomly exposed for a fraction of time8,14.

A number of chromatin-remodelling complexes mobilize nucleosomes, causing the histone octamers to move short distances along the DNA8. Each complex carries a protein with ATPase activity, which provides the necessary energy. Many of these complexes are members of the so-called SWI/SNF family, which includes SWI/SNF in budding yeast and human, RSC in yeast, and Brahma in Drosophila. They have similar helicase-motif subunits, but varying co-factors within the complex. Another SWI/SNF subfamily is based on the helicase-domain protein ISWI, which combines with other proteins to form the complexes NURF, CHRAC and ACF in Drosophila, and RSF in humans. A third subfamily is based on the helicase motif protein Mi-2.

Remodelling complexes differ in the mechanisms by which they disrupt nucleosome structure, and they are associated with co-factors that allow them to interact selectively with other regulatory proteins that bind to specific DNA sequences. For example, only certain classes of transcription factors interact with the mammalian SWI/SNF remodelling complex. Thus remodelling complexes can be selective in the genes they modify, and transcription factors recruit these complexes as tools to gain access to chromatin.

Histone modification

Nucleosomes are not passive participants in this recognition process. They can accommodate chemical modifications — either on histone 'tails' that extend from the nucleosome surface, or within the body of the octamer — that serve as signals for the binding of specific proteins. A large number of modifications are already known, such as acetylation of amino acids in the histone tails, and new ones are being identified at a bewildering rate (Box 1). Many modifications are associated with distinct patterns of gene expression, DNA repair or replication, and it is likely that most or all modifications will ultimately be found to have distinct phenotypes.

In addition to histone modifications, nucleosomes can have core histones substituted by a variant, with functional consequences. Histone H2AZ, which is associated with reduced nucleosome stability, replaces H2A non-randomly at specific sites in the genome. Histone H2AX, which is distributed throughout the genome, is a target of phosphorylation accompanying repair of DNA breakage11, and also seems to be involved in the V(D)J recombination events that lead to the assembly of immunoglobulin and T-cell-receptor genes. A histone H3 variant, H3.3, can be incorporated into chromatin in non-dividing cells, and seems to be associated with transcriptionally active genes10. Each of these histone substitutions is likely to be targeted by, and associated with, the binding of other proteins involved in gene activation; thus these proteins can be considered central to the formation of localized chromatin structures that are specific for gene activation or accessibility.

Interdependence of histone modifications

An interplay exists between histone modification and chromatin remodelling. For example, expression of a gene may require disruption of nucleosomes positioned at the promoter by a chromatin-remodelling complex before an enzyme required for histone acetylation can be recruited15. In contrast, expression of a different gene may require that histone-acetylating enzymes and even RNA polymerase bind to the promoter prior to recruitment of the chromatin-remodelling complex16. There is no common series of steps that underlies all or even most processes of gene activation. For any given gene, however, the order of recruitment of chromatin-modifying factors may be crucial for the appropriate timing of expression.

Aside from activating gene expression, histone modifications and chromatin remodelling can also silence genes. Specific histone modifications and chromatin-remodelling complexes, such as the NuRD complex, have been implicated in silencing at some loci8. Even SWI/SNF complexes, which are strongly correlated with gene activation, also seem to silence a number of genes.

Specialized chromatin structures

Some regions of the genome are packaged in chromatin with distinct structural features. Three of the most studied such regions are centromeres (important for chromosomal organization during mitosis), telomeres (at the ends of chromosomes) and the inactive X chromosome in mammals. In each case, specific chromosomal structures are defined both by histones modified or substituted in specific patterns, and by the association of additional non-histone proteins or even by regulatory RNA molecules, which increasingly are implicated in chromatin organization17,18,19.

Inactive X chromosomes in mammals are enriched for the histone variant macroH2A20, which is almost three times as large as H2A itself. At vertebrate centromeres, one of the core histones, H3, is replaced by a variant, CENP-A; a similar replacement occurs in centromeres of the fruitfly Drosophila, indicating that this is an ancient evolutionary adaptation at centromeres. CENP-A in turn forms a complex with the centromere proteins CENP-B and -C, which mediates the formation of phased arrays of CENP-A-containing nucleosomes. In turn, additional proteins are recruited during cell division to enable the orderly separation of the two chromatids that make up each chromosome. After DNA replication, the sister chromatids are held together initially by a multisubunit complex called cohesin, while a second complex, condensin, helps to compact the chromosomes21. These complexes recognize distinct centromere structures, and a specialized nucleosome-remodelling complex associates with cohesin to help it gain access to the chromosomes22.

In the budding yeast Saccharomyces cerevisiae, gene silencing at the ends of chromosomes is mediated by a complex that assembles at telomeres. The complex is stabilized by the binding of the protein RAP1 to the telomere repeat sequences. Additional components, including the silent information regulator (SIR) proteins, then bind inward from the telomere ends, partly through interactions with local nucleosomes23. One of the SIR proteins is a histone deacetylase and is thought to repress gene expression at this site. Some components of these unique complexes are evolutionarily conserved, suggesting that these unusual chromatin structures may be found in organisms other than yeast.

The silencing of genes in the vicinity of centromeres in the fission yeast Schizosaccharomyces pombe has been shown recently17,18,19 to depend on a set of RNA-processing enzymes involved in RNA interference, a process by which double-stranded RNA directs sequence-specific degradation of messenger RNA. One of these enzymes, Dicer, generates RNA fragments about 23 nucleotides long from transcripts of centromeric regions, which then seem in some way to be targeted back to the centromere to initiate the histone-dependent silencing mechanism. Moreover, non-coding RNA transcripts have been identified on the inactive X chromosome and elsewhere in the genome, and may have related roles at those loci24.

Epigenetic inheritance

An epigenetic trait is one that is transmitted independently of the DNA sequence itself. This can occur at the level of cell division — for example, daughter cells may inherit a pattern of gene expression from parental cells (so-called cellular memory) — or at the generational level, when an offspring inherits a trait from its parents.

The classic example of epigenetic inheritance is the phenomenon of imprinting, in which the expression status of a gene depends upon the parent from which it is derived. In mammals, for example, the Igf2 gene (encoding insulin-like growth factor-2) is expressed only from the paternal copy of the gene, whereas the H19 gene is expressed solely from the maternal allele. The mechanism by which this pattern of inheritance is accomplished involves (in part) DNA methylation on the paternal allele. This causes dissociation of a chromatin protein known as CTCF, which normally blocks a downstream enhancer; consequently, the enhancer is then free to activate Igf2 expression25,26.

The methylation state of an allele is linked inextricably with patterns of histone modification27. Methylated CpG (guanine–cytosine) dinucleotide sites near a gene recruit specific DNA-binding proteins, which in turn recruit histone deacetylases, resulting in loss of histone acetylation and silencing of gene expression. But if histone deacetylation occurs first, it is possible to replace the acetyl group at histone H3 lysine 9 (Lys 9) with one to three methyl groups. It has been shown in turn in the fungus Neurospora that the ability to methylate histone Lys 9 is essential for DNA methylation28, suggesting that local methylation at Lys 9 may provide a signal for methylation of the underlying DNA. Furthermore, in a different reaction pathway, maintenance of histone acetylation at promoters can lead to inhibition of DNA methylation29.

Epigenetic inheritance involves the maintenance of patterns of histone modification and/or of association of chromosomal proteins correlated with specific expression states. The same mechanisms for propagating permissive or repressive chromatin structure could preserve the pattern of histone modification during replication, when old nucleosomes are distributed randomly on both sides of the fork, with the newly synthesized histones interspersed (Fig. 2).

Figure 2: Propagation of inactive ('condensed') and active chromatin states (adapted from ref. 43).
figure 2

a, Nucleosomes methylated at H3 Lys 9 are a mark of inactive chromatin and are bound by the heterochromatin protein HP1. HP1 in turn recruits a histone methyltransferase enzyme, Suv39h, that specifically methylates H3 Lys9, allowing methylation and HP1 binding to extend to successive nucleosomes in a self-propagating fashion43,44,45. Some DNA sequence elements (purple rectangle) and their associated proteins may serve as barriers between different chromatin regions, perhaps by blocking the propagation of histone modifications and/or the binding of heterochromatin proteins, thus helping to establish well-defined domains46. b, A similar propagation mechanism may be constructed for activation by histone acetylation (right). Here, acetylated lysines are recognized by an acetylase enzyme, resulting in acetylation of the adjacent nucleosome. c, A proposed model for epigenetic inheritance of methylation. During replication, parental nucleosomes carrying H3 with Lys 9 methylation (blue) are distributed randomly to both sides of the replication fork. Nucleosomes containing newly synthesized histones (pink) are deposited between the old ones, and are methylated by a mechanism similar to that described above. The daughter-cell chromatin then carries the same modification as the parent.

The maintenance of repressed or activated transcription states represents an efficient mechanism for progressive cellular differentiation30. In such a model, fundamental decisions regarding the turning on or off of genes or groups of genes need to be made only once. This principle is perhaps most clearly illustrated by the example of Polycomb-group (PcG)-mediated gene repression in Drosophila31. At a specific time during development, a complex of proteins, encoded by a collective of PcG genes, binds to sequences within some genes, but only in cells where the genes are silent. At subsequent stages of development, the repressed state is maintained by the PcG complex in the absence of the original negative signals. Activated expression states can be similarly maintained, again in the absence of the original transcriptional activators, by a complex of proteins encoded by genes collectively termed the trithorax group31. In both cases, the maintenance of gene-expression patterns is associated with specific histone modification and chromatin-remodelling activities32,33,34.

Chromatin and nuclear self-organization

Although bacteria lack a true nucleus, a specific region of the cell, called the nucleoid, contains the chromosome, which in turn is organized into supercoiled domains or loops emanating from central nodes. The organization of the Escherichia coli genome into such domains is necessary to allow it to fit within the confines of the cell2. Extensions of the chromosome into the cytoplasm correlate with regions that are transcriptionally active. Upon inhibition of transcription, these extensions recede to the nucleoid to give it a more even, spherical shape. The localization of genomic sequences within a bacterial cell is thus determined by their association with the transcriptional/translational apparatus.

The organization of the genome in eukaryotic nuclei, while necessarily more complex than in bacteria, seems to follow the same model as E. coli. Individual chromosomes largely occupy distinct 'territories' within the nucleus. Within these territories, actively transcribed genes are on surfaces of channels within subchromosomal domains35 where soluble transcription factors are presumably more likely to gain access to them.

There is, however, more to the story. The eukaryotic nucleus has distinct subcompartments within which specific nuclear proteins are enriched. For example, the nucleolus, where high-level transcription of ribosomal genes occurs, and splicing-factor compartments accumulate high local concentrations of certain proteins. In some cases there are attachment sites within the nucleus for the proteins. As a rudimentary example, one or more of the proteins associated with yeast telomeres is able to tether the telomeres in clusters to the nuclear periphery36. This clustering creates a high local concentration of binding sites for the SIR silencing proteins, which in turn results in a high local concentration of these proteins, and a high occupancy of even relatively weak binding sites. The effect is to increase the extent of telomeric silencing — SIR-dependent gene silencing can be accomplished just by artificially tethering a gene to the nuclear periphery37.

What organizes the formation of nuclear subdomains? Although there is evidence for a proteinaceous nuclear matrix38, the example provided by yeast telomeres suggests that the chromatin fibre itself may be the organizer. Many, and probably most, chromatin-binding proteins are in continuous flux between association with chromatin and the nucleoplasm39,40. Even such fundamental chromatin proteins as histone H1 have been found to bind for periods of only a few seconds, interspersed with periods of free diffusion. The notable exceptions to this rule are the core histones, the binding of which is much more stable — on the order of minutes for H2A/H2B, and hours for H3/H4. The on–off rates of proteins binding different regions of the genome may depend on the pattern of histone modifications, which in turn determines their relative enrichment in different regions of the nucleus. Thus, the genome as packaged with histones could determine the nature of nuclear subcompartments.

Future challenges

Chromatin proteins and DNA are partners in the control of the activities of the genetic material within cells. The rate-limiting step in activating gene expression typically involves alterations of chromatin structure. The chromosome is an intricately folded nucleoprotein complex with many domains, in which local chromatin structure is devoted to maintaining genes in an active or silenced configuration, to accommodating DNA replication, chromosome pairing and segregation, and to maintaining telomeric integrity. Recent results suggest strongly that in all of these cases the primary indicators of such specialization are carried on the histones. Thus, the regulatory signals that determine local properties, as well as epigenetic transmission of those properties, are likely to be on histones.

The already large catalogue of histone modifications continues to grow rapidly. Although in most cases the loss of the modification (for example, by mutating the responsible enzyme) has a detectable effect on phenotype, the function of many modifications has not yet been determined. While this will be the focus of future research, it presents significant problems because a given modification will occur at many sites in the genome, and mutations could have widespread effects, both direct and indirect. A second significant challenge arises from the potential redundancy of the 'histone code': it is possible that either of two distinct modifications could specify a single structural and functional state, or that the two modifications are always linked to one another. Significant effort will be necessary to determine the complexity of this code, that is, the number of distinct states that can be specified.

The most important immediate problem is to identify the initiating step in establishing a local chromatin state, which may also correspond to an epigenetic state. Silencing at centromeres and perhaps elsewhere seems to be initiated by small RNA transcripts from within the region to be silenced, but formation of other kinds of structures might be triggered directly by a specific histone modification. In the longer term it will be necessary to relate the reactions at individual nucleosomes to higher-order chromatin structures; this will depend in part on the development of higher-resolution methods for determining those structures, and their organization within the nucleus.

At its simplest level, chromatin should be viewed as a single entity, carrying within it the combined genetic and epigenetic codes. Ultimately our understanding of the dynamic states of chromatin throughout the genome will be integrated with a detailed knowledge of patterns of regulation of all genes.