EPIGENETICS

Epigenetics refers to a system of inheritance that does not involve changes in the primary DNA sequence but that can nevertheless be transmitted through cell and organismal lineages and can result in changes in gene expression. Epigenetic alterations have been implicated in the initiation and progression of malignancy, serving as primary disruptors of cancer stem cells as well as Knudson “second hits” that silence the expression of tumor suppressor genes (1). The traditional study of epigenetic changes encompasses the examination of chromatin structure, of covalent histone modifications, such as methylation, acetylation, phosphorylation, ubiquitination, sumoylation, and ADP-ribosylation, as well as of alterations in DNA methylation. For example, in a transcriptionally active gene, histone H3 is acetylated at K9 and K14, dimethylated or trimethylated at K4, and phosphorylated at S10; histone H4 is acetylated at K5 and methylated at K3. By contrast, histone H3 is dimethylated or trimethylated at K9 in inactive chromatin, and H4 sumoylation is found in inactive chromatin (2,3). In general, DNA methylation of promoter regions leads to silencing of that gene. There are important syncretic interactions between this putative histone code and DNA methylation, and it is now possible to describe a more comprehensive epigenetic code regulating the transcription of each gene (4). Genome-wide DNA hypomethylation and histone hypoacetylation are now recognized as signature findings in cancer, and efforts to describe the human cancer epigenome are underway in several labs throughout the world (57).

NUCLEAR GEOGRAPHY AND CHROMOSOME TERRITORIES

The architecture and geography of the nucleus and its constituents represents a new dimension of regulatory control that is related to epigenetic marks and organization. During mitosis, chromosomes assume a very compact configuration, allowing them to segregate in anticipation of cell division. In intermitotic cells, however, much of the chromatin is decondensed, and spreads out into a three-dimensional chromosomal territory within a highly organized nuclear architecture. Controversy exists as to whether each chromosome has a fixed nuclear “address” throughout development, during oncogenesis (8,9) and across cell types. In general, chromosomes with greater gene density occupy more central parts of the nucleus, but loops and extensions may protrude from the central territory. It is not known whether homologous chromosome territories are randomly situated, or if maternal and paternal chromosomes are somehow marked for specific nuclear locales (10).

The chromosome territories comprise heterochromatin, chromatin loops, and other higher-order chromatin fibers, interspersed with DNA-free channels and interchromosomal domains, which may contain the transcription apparatus to serve as transcription factories or active chromatin hubs. One can envisage the nucleus as a well-ordered sponge composed of individual chromosome territories with interchromosomal interactions at the borders between adjacent chromosomes (11). Moreover, recent studies have shown that these interchromosomal interactions may also take place well within the central portions of the chromosome territories, as the euchromatin loops migrate by Brownian motion and make numerous interactions at the surface and inside of nearby territories (12). The nuclear matrix and its attachments also give form to this conglomeration of chromosome territories (13). It is highly likely that many of these interactions between distant DNA segments are mediated or strengthened by DNA-binding proteins, including the insulating protein CTCF (14), SATB1 (15), MeCP2 (16), and the DNA architectural protein HMG I/Y (17). In addition to providing protein-protein bridges between DNA loops, these DNA-binding proteins might also tether DNA segments to the nucleolus and perhaps other nuclear structures (18).

Recently, it has become clear that nuclear architecture and chromatin geography are important factors in the regulation of gene expression (19), and that these components may play a vital epigenetic role both in normal physiology as well as in the initiation and progression of malignancies (8,20). Loops of DNA bulge out from euchromatic portions of chromosomes, and the genes on these loops may localize to active chromatin hubs where gene transcription takes place (21). Chromosome looping allows distant segments of DNA from the same chromosome or from different chromosomes to interact and, potentially, to modify the expression of genes that are very distant from another. The propinquity of these loops may also facilitate recombination and chromatin rearrangements (2224).

DNA LOOPING AND LONG-RANGE INTRACHROMOSOMAL INTERACTIONS

The existence of long-range interactions between regions of a chromosome separated by more than 100,000 bases led to the discovery of intrachromosomal loops that juxtapose downstream enhancers close to promoter regions to increase gene transcription (2527). Two techniques have been essential for characterizing these long-range interactions. In the chromosome conformation capture (3C) technique (28,29), formaldehyde is used to cross-link nearby DNA sequences. The DNA is digested with a restriction enzyme and then allowed to undergo intramolecular ligation. The cross-links are then reversed and the resulting DNA sequence, which is a chimera of two distant sequences, is determined. RNA-TRAP (tagging and recovery of associated proteins) combines RNA FISH with a procedure that targets horseradish peroxidase to the vicinity of an actively transcribed gene. That enzyme catalyzes the deposition of biotin onto nearby chromatin, making it possible to determine whether distant portions of the chromosome become linked by looping with actively transcribed genes (30). Using these techniques, Fraser's lab demonstrated that the β-globin locus control region (LCR) and the actively transcribed β-globin genes were in close proximity with one another in erythroid cells. DNA looping of the intervening 50 kb between the LCR and the β-globin genes put these DNA segments in close physical proximity and facilitated high-level gene expression (30,31). This intrachromosomal looping has now been extensively studied for several genes. Spilianakis and Flavell (32) studied the T helper type 2 (TH2) locus control region, and showed that promoters from three cytokines (IL-4, -5, and -13), which are coordinately expressed and are located over a span of >120 kb on mouse chromosome 11, are in close proximity to one another. The interaction is dependent upon the presence of two transcription factors, STAT6 and GATA3 (32), as well as SATB1, an anchoring protein that organizes nuclear architecture and recruits chromatin remodeling factors (15).

Using 3C methodology, Wolf Reik and his colleagues elegantly demonstrated that on mouse paternal chromosome 7, the Igf2 differentially methylated region-2 (DMR2) loops out to interact with the distant methylated Igf2/H19 imprinting control region (ICR), thereby pushing the Igf2 promoter into close contact with the H19-enhancer, which lies 100 kb downstream. This interaction results in Igf2 transcription from this allele. In contrast, on the maternal chromosome, DMR1 interacts with the unmethylated ICR via CTCF binding, thereby partitioning the Igf2 promoter into a silent loop, inhibiting Igf2 and promoting H19 transcription (33). Chromosome looping has now been shown to be an important mechanism whereby distant DNA elements can interact in other gene systems as well (34,35). Intrachromosomal looping has now been described for the Dlx5/Dlx6 imprinted dyad (16), for immune genes (32), for the immunoglobin heavy-chain locus (3638), and for human chromosome 4q35 locus D4Z4 polymorphic repeat involved in fascioscapulohumeral muscular dystrophy (FSHD) (39). Thus, these intrachromosomal chromatin loops constitute an important element in the architecture of the nucleus and in regulatory control of a number of genes (40).

INTERCHROMOSOMAL INTERACTIONS

Physiologically important, nonrandom interactions between chromosomes have now been observed in a number of normal and malignant cells. Genomic imprinting is a form of non-Mendelian inheritance in which only one of the parental alleles is expressed, while the other parental allele is epigenetically silenced. A number of genes on human chromosome 15q11-q13 are imprinted; loss of the paternal genome in this region results in Prader-Willi syndrome, whereas loss of the maternal UBE3A gene in this region leads to Angelman syndrome. In normal T lymphocytes, there is homologous association between the two chromosomes during late S phase that is restricted to this imprinted region and that is absent in cells from patients with the aforementioned imprinting disorders. The authors suggest that this association is intimately related to the imprinting process, and they also noted a similar homologous association between mouse chromosome 11 at the H19 imprinting region (41).

X-inactivation is the process whereby one X chromosome is stochastically silenced. The mechanism for the inactivation of X chromosome genes bears many similarities to the epigenetic silencing of imprinted genes. Moreover, X inactivation is dependent upon the two imprinted genes, Xist and Tsix. Two groups have recently shown that a transient co-localization of the X-inactivation centers of the two X chromosomes is required for proper initiation of X-inactivation (42,43). Taken together, these studies suggest that monoallelic gene silencing may depend upon transient homologous interchromosomal associations [“kissing chromosomes” (44,45)], which may keep count of the number of alleles that are expressed.

The balanced translocations that are frequently found in a variety of cancers require that the two chromosomes come into physical contact with each other. The fact that certain cancers are characterized by very specific translocations suggests that very precise and limited regions of the two heterologous chromosomes normally are in close propinquity in some, if not all, cells (46), indicating the vital importance of three-dimensional genome structure (23). In addition, the specificity of intrachromosomal rearrangements is also dependent upon proximity of DNA loops. For example, in radiation-induced papillary thyroid cancer, there is a rearrangement between RET and H4 on chromosome 10, which are 30 megabases apart; these two genes are frequently juxtaposed in normal thyroid cells (22).

It now appears likely that gene regulation through long range intra- and interchromosomal interactions may be a relatively common event (47). There is a well-characterized interchromosomal interaction between the promoter region of the IFN-γ gene on chromosome 10 and the TH2 cytokine locus on chromosome 11 that regulates gene transcription (48). The co-regulated human globin genes are frequently found to be in spatial proximity when they are transcriptionally active (49). There are more than 1300 odorant receptors (OR) spread throughout the genome, yet each olfactory neuron only expresses one allele of one of these receptors; all of the other OR, and one allele of the expressed receptor gene, are silenced. Axel's laboratory has recently shown that the H enhancer element on mouse chromosome 14 associates with a single OR gene, which may be located on chromosome 14 or on a different chromosome. One of the two H alleles in these cells is methylated and is therefore thought to be inactive. The other, nonmethylated H allele then interacts with a single allele of one OR gene, (50) targeting that particular OR allele to be the one OR (out of >2600 alleles) that is expressed in the neuron. This stochastic silencing of all but one OR allele is achieved through a classic epigenetic mechanism (methylation of the presumably inactive H enhancer) as well as via a geographical shift, with changing interchromosomal associations.

While studying the regulation of Igf2, an important growth factor that has been implicated in overgrowth and neoplastic syndromes, we devised a novel method for studying long-range interactions among remote genes (14). Igf2 and H19 are coordinately regulated adjacent imprinted genes located approximately 80 kb apart on mouse chromosome 7 (33). An imprinting control region (ICR) that is situated between the two genes contains four CTCF binding regions (5153). CTCF is a zinc-finger binding protein that binds to a variety of DNA sequences and can serve as an insulator or as a regulator of gene transcription (54). CTCF can also bind to itself to form dimers or higher-order multimers. Multiple CTCF binding sites have been identified throughout the genome. To determine whether the Igf2/H19 imprinting center has long-range associations with heretofore unidentified DNA sequences other than the known Igf2 DMR and another cis-elements on the some chromosome, we developed an adaptation of the 3C analytic technique that we have named the associated chromosome trap (ACT) assay. The advantage of this assay is that it can be used to identify previously unknown and unsuspected remote interacting sequences (Fig. 1). We use a standard 3C analysis using the BglII (AGATCT) restriction enzyme at the site of interest. The DNA and protein is subjected to formalin fixation, freezing the chromatin in its three-dimensional configuration so that DNA regions that are in close juxtaposition inside nuclear compartments are able to be ligated by the 3C protocol. The ligated 3C products are purified, and then they are further digested with the 4-cutter restriction enzyme MspI (CCGG). This allows us to ligate a linker oligonucleotide of known sequence, and then to amplify these associated DNA using a primer pair consisting of specific ICR primers near a BglII site and the linker primer. The reaction will yield 3C-linked products at the ICR BglII site. The PCR products are analyzed on a short sequencing gel, and the DNA bands of interest are cut out, re-amplified by a nested primer set, and subjected to DNA sequencing. By using different restriction enzymes, this method can, in theory, capture all long range interactions.

Figure 1
figure 1

Flow chart of ACT assay. The 3C technique requires prior identification of both elements of the putative interacting DNA segments. Using the ACT assay, it is possible to search for putative interacting DNA segments without knowing the identity of one of the segments a priori. Reprinted from Ling JQ et al. 2006 Science 312:269–272, ©2006 by the American Association for the Advancement of Science, with permission.

We used an interspecific C57BL/6 X Mus spretus BMM3-4 cell line developed in our laboratory so that we could distinguish the maternal from the paternal allele. We identified three distinct bands that correspond to unique DNA sequences that appear to interact with the ICR. One of the sequences was identified as the Igf2 DMR1 on mouse chromosome 7, which we had expected to find as it had previously been identified by other investigators (2527). Two other DNA fragments, termed IAS1 (ICR-associated site) and IAS2, were identified and sequenced. IAS1 corresponded to an intergenic sequence on mouse chromosome 11, located between the Wsb1 and Nf1 genes, and IAS2 was localized to a sequence on mouse chromosome 6. While the ICR interaction with CTCF occurred exclusively on the maternal allele, the interaction of CTCF with IAS1 was restricted to the paternal chromosome. To demonstrate this physical co-localization, we used fluorescent in situ hybridization (FISH) using BAC probes for each locus. One and only one pair of alleles was co-localized in each of the three cell lines in 30–42% of all cells examined, further demonstrating the close association between these two chromosomes.

To determine the mechanism that controls this interaction, we used cell lines in which the ICR region was deleted. When the maternal ICR was deleted, no interchromosomal interaction was detected by FISH, but when the paternal ICR was deleted, the co-localization of these interchromosomal regions was present, demonstrating the necessity for the ICR region on chromosome 7 for the interaction, and confirming the allele-specific requirement for the CTCF-binding maternal allele. The loss of the paternal allele, which does not bind CTCF, does not affect the interchromosomal relationship, further suggesting that CTCF was also a crucial factor in the interaction.

Therefore, we investigated the role of CTCF in maintaining and/or initiating this chromosome 7/chromosome 11 interaction by constructing a cell line in which endogenous CTCF levels were diminished. After transfecting a CTCF-derived shRNA expression vector into the BMM3-4 cell lines, we determined that CTCF mRNA levels were greatly reduced. Moreover, we could detect no CTCF protein using Western blot analysis; the CTCF levels were unchanged in a cell line transfected by an empty vector. In the cells that expressed very low levels of CTCF, FISH analysis revealed the absence of co-localization between IAS1 on chromosome 11 and the ICR on chromosome 7, demonstrating that the interaction of the ICR with Nf1-Wsb1 is dependent on the presence of CTCF. We hypothesize that CTCF proteins bind to both gene segments and then dimerize or form multimers, anchoring and securing the interchromosomal interaction (Fig. 2). Preliminary data indicate that this interchromosomal interaction also occurs in human cells, where the IGF2/H19 ICR on human chromosome 11 interactions with NF1 on chromosome 17 (Ling et al., unpublished data).

Figure 2
figure 2

In this model, CTCF can bind to the maternal ICR between Igf2 and H19 on mouse chromosome 7 and to paternal chromosome 11 between Wsb1 and Nf1. When CTCF is bound to both alleles, the loops upon which they reside may enter a transcription factory. The CTCF molecules can then dimerize (orange bridge), stabilizing the loops in close juxtaposition and allowing for regulatory interchromosomal interactions. In the absence of CTCF, neither loop enters the transcription factory, Nf1 and Wsb1 transcription declines, and Igf2 imprinting is lost. Finally, when the ICR is deleted from the maternal chromosome, chromosome 7 does not enter the factory and Igf2 imprinting is lost. Chromosome 11 does enter the transcription factory, however, where it has access to a limiting number of transcription factors and less competition for CTCF; therefore, more Wsb1 and Nf1 are transcribed.

Dekker's laboratory has extended the 3C technique by combining it with microarrays and quantitative DNA sequencing to map long-range interactions in detail; they call their method chromosome conformation capture carbon copy, or 5C (55). Recently, a number of other groups have also published modifications of the 3C technique to examine the range and extent of interchromosomal interactions. Using an open-ended genome-wide scanning technique based on 3C, HoxB1 was shown to be associated with numerous intra- and interchromosomal loci. When the gene was induced, interchromosomal associations were favored (56). Simonis and colleagues (57) have married the 3C technique to microarray technology, and have developed chromosome conformation capture-on-a-chip, or 4C. Using this methodology, they have shown that the actively transcribed β-globin locus in fetal liver interacts primarily with other local and transcriptionally active loci, while the inactive β-globin locus in brain interacts with a different set of loci. The housekeeping gene Rad23a makes numerous intra- as well as interchromosomal interactions, primarily with other active genes. Of great interest, the identity of these partners is conserved across tissues, providing evidence for a general chromosomal folding pattern that is preserved between different cell types. The authors comment that there is a very extensive and essentially unpredictable network of long-range interactions, indicating extensive mingling of chromosomes.

Ohlsson's laboratory has devised another 4C assay for screening for long-range interactions that they term circular chromosome conformation capture (58). They report that 114 unique sequences, originating from all of the autosomes, interacted with Igf2-H19 imprinting control region. As we showed with the Nf1-Wsb1 interaction, these long-range associations primarily involved the maternal allele of Igf2-H19. They also reported that a disproportionate number of these interactions involved other imprinted genes, and that the patterns of interaction changed during differentiation of ES cells.

CONCLUSION

Increasing evidence of the importance of long-range interactions between DNA segments has provided us with an increased appreciation of the complexity of gene regulation. Our simple notion of gene expression governed by either a cis apparatus (nearby DNA sequences) or by trans mechanisms (via proteins or RNA derived from distant genes on the same or different chromosomes) has become blurred by the knowledge that long range interactions between DNA segments may directly modify transcription (59). Our concepts of gene regulation and epigenetics must now incorporate a three-dimensional nuclear geography of dynamic interactions among distant genetic loci. These long range intra- and interchromosomal associations, which may be regulated by DNA binding proteins, can serve several important, possibly overlapping functions: 1) promoters or enhancers may be brought in close propinquity to the protein-coding portion of a gene to stimulate transcription (as is the case with the OR genes); 2) DNA regions may interact via a regulatory DNA binding protein to alter gene transcription (as seen with the interaction of Igf2/H19 and Wsb1/Nf1); or 3) DNA binding proteins can tether physiologically and biochemically related genes into specific transcription factories, where they can facilitate coordinated regulation of groups of genes that can then share a limited set of transcription factors and/or machinery in an active chromosome hub (Fig. 3). A more complete understanding of nuclear geography and architecture on a genome-wide scale will allow us to develop a new and more complete spatial annotation of genomic and epigenomic information.

Figure 3
figure 3

Schematic of an ACH. In mammalian cell nucleus, each chromosome occupies its chromosome territory (CT, red broken lines outline their border) spatially, intermingling between different CT (overlapping regions) and interchromosome channels that exist among them. Actively transcribed genes in loops (arched lines) with related physiologic or biochemical functions impose an active transcriptional hub among different CT, sharing transcription factors and transcriptional machinery. Intra- and interchromosomal interactions are mediated by nuclear architectural proteins (orange triangles), and the transcription of each gene (in green arrows) depends on the establishment of dynamic long-range interactions of chromosome segments at different stages of the cell cycle.