Heterochromatin: general definition

The word ‘heterochromatin’ is a cytological term that originally referred to chromosome portions that were stained deeply at prophase and retained a compact structure throughout the mitotic cell cycle (Heitz, 1928). Heterochromatin was classified further into facultative and constitutive types (Brown, 1966). Facultative heterochromatin corresponds to silenced euchromatin (chromosome regions, entire chromosomes or even whole genomes), whereas constitutive heterochromatin is found commonly in large blocks near centromeres and telomeres. The constitutive component consists mostly of repetitive DNA sequences and maintains its characteristics on homologous chromosomes.

Facultative heterochromatin

A well-known example of facultative heterochromatin is the inactive X chromosome in the somatic cells of female mammals (reviewed by Plath et al, 2002, 2003). The inactive X chromosome becomes heteropycnotic, suggesting that the chromatin in the silent regions is relatively condensed. The essential steps leading to inactivation can be summarized as follows: initiation of the heterochromatinization process in early developmental stages, starting from a specific locus called the X inactivation centre; spreading of heterochromatinization along the entire chromosome; and once established, maintaining the heterochromatic state through subsequent somatic cell divisions.

Heterochromatinization is achieved by changing the chromatin of the X chromosomes from a transcriptionally active state to an inactive state. This involves a cascade of chromatin modifications that inhibit the establishment of transcription complexes. These modifications include methylation of histone H3 lysine 9 and histone H3 lysine 27; hypoacetylation of histones H2A, H3 and H4; decrease of histone H3 lysine 4 methylation; and changes in the time of DNA replication. Such features of the inactive chromatin of X chromosomes seem to be shared by inactive chromatin elsewhere in mammalian genomes.

Constitutive heterochromatin is not a genomic wasteland

Constitutive heterochromatin is a ubiquitous and common component of eukaryotic genomes. It forms about 5% of the genome in Arabidopsis thaliana, 30% in humans, 30% in Drosophila melanogaster and up to 90% in certain nematodes (Moritz and Roth, 1976; Gatti and Pimpinelli, 1992; Arabidopsis Genome Initiative, 2000). Although constitutive heterochromatin is one of the basic components of eukaryotic chromosomes, the reasons for its widespread occurrence are still unclear.

Several unusual properties characterize constitutive heterochromatin in virtually all animal and plant species, which together have led to the traditional view of this material as a ‘desert’ of genetic functions (reviewed by John, 1988): a strongly reduced level of meiotic recombination; low gene density; repression of the activity of euchromatic genes when nearby, a phenomenon termed position effect variegation; late replication during S phase; enrichment in highly and repetitive DNAs; and transcriptional inertness.

In the past two decades, however, the idea that constitutive heterochromatin is merely a genomic wasteland has been modified in the light of genetic, cytological and molecular studies conducted primarily in the model organism D. melanogaster. These studies have shown that constitutive heterochromatin performs important cellular functions and carries essential genes for viability and fertility (reviewed by Gatti and Pimpinelli, 1992; Williams and Robbins, 1992; Weiler and Wakimoto, 1995; Dernburg et al, 1996; Elgin, 1996; Karpen et al, 1996; Eissenberg and Hilliker, 2000; Henikoff et al, 2001; Coulthard et al, 2003; Fitzpatrick et al, 2005; Dimitri et al, 2005a, 2005b). In addition, about 450 predicted genes have recently been identified by the annotation of the heterochromatin sequence (Hoskins et al, 2002). Remarkably, the presence of coding genes in heterochromatin, far from being a peculiarity of Drosophila, seems to be a conserved trait in the evolution of eukaryotic genomes. Heterochromatic genes have been found in Saccharomyces cerevisiae, Schizosaccharomyces pombe, Oryza sativa and A. thaliana, as well as in humans (Kuln et al, 1991; Arabidopsis Genome Initiative, 2000; Horvath et al, 2000; Brun et al, 2003; Nagaki et al, 2004).

Genes resident in the constitutive heterochromatin of Drosophila

Single-copy genes encoding essential functions

D. melanogaster is the model organism in which the greatest progress in the study of heterochromatin function has been made as a result of combined genetic, cytological and genomic approaches. Mutatable genes essential for viability and fertility were initially identified in D. melanogaster by recessive lethal mutations that were genetically linked to heterochromatin (Hilliker, 1976; Marchant and Holm, 1988). Complementation analysis using rearrangements with cytologically determined breakpoints in the heterochromatin of mitotic chromosomes finally demonstrated that these genes are embedded in heterochromatin and enabled their mapping (Dimitri, 1991; Koryakov et al, 2002). Notably, most of the heterochromatic genes thus far detected are located in chromosome regions that fluoresce weakly after staining with 4,6-diamino-2-phenylindole-dihydrochloride (DAPI). These regions harbour clusters of transposable elements and are devoid of highly repetitive satellite DNAs (Lohe et al, 1993; Pimpinelli et al, 1995). Thus far, at least 32 essential genes are known to map to the mitotic heterochromatin of chromosomes 2 and 3. Only a few of these, however, are sufficiently defined at the molecular level: RpL5, light, concertina, rolled, RpL38, Nipped-B, Nipped-A, Parp and RpL15 (Hilliker, 1976; Devlin et al, 1990a, 1990b; Parks and Wieschaus, 1991; Biggs et al, 1994; Rollins et al, 1999; Tulin et al, 2002; Myster et al, 2004; Marygold et al, 2005; Schulze et al, 2005).

Predicted genes

The release of the sequence of D. melanogaster heterochromatin by the Berkeley Drosophila Genome Project (http://www.fruitfly.org/) and the Drosophila Heterochromatin Genome Project (DHGP; http://www.dhgp.org/index_release_notes.html) has greatly facilitated the study of the molecular organization and function of heterochromatic genes. Initially, 3.8 Mb of about 120 Mb of the D. melanogaster euchromatic genome sequence was found to correspond to heterochromatic sequences (Adams et al, 2000). More recently, an improved whole-genome shotgun assembly (heterochomatic-WGS3; Hoskins et al, 2002) was produced, which includes 20.7 Mb of draft-quality heterochromatic sequence. About 450 predicted genes have been identified by the annotation of the heterochromatin sequence (Hoskins et al, 2002), suggesting that the number of active genes in the constitutive heterochromatin of D. melanogaster is higher than that defined by genetic analysis. In the WGS3 heterochromatin, 45% of predicted genes have an overlapping expressed sequence tag, whereas 20% are based on full-insert sequences of complementary DNAs (Hoskins et al, 2002).

Several studies have concentrated on efforts to map predicted genes to the mitotic heterochromatin of D. melanogaster using bacterial artificial chromosomes, complementary DNAs and P-elements (see Figure 1; Hoskins et al, 2002; Corradini et al, 2003; Yasuhara et al, 2003; R Rossi et al, unpublished). For example, about a hundred predicted genes have been assigned to specific regions of the mitotic heterochromatin map of chromosome 2, in which genetic analyses detected 17 essential genes (Table 1). At least two factors might account for the high number of predicted genes: first, heterochromatin might contain an excess of non-essential coding genes. Indeed, CG40293, p120 and CG17486 of 2Rh (Table 1) were found to be non-essential (Myster et al, 2004). Second, seemingly different predicted genes might be portions of the same gene, or some predictions might be incorrect.

Figure 1
figure 1

Examples of FISH mapping of three different DNA sequences to the mitotic heterochromatin of chromosome 2. Prometaphase chromosomes of larval brain cells from the y; cn bw sp strain were stained with DAPI and pseudo-coloured in blue; the different FISH signals are shown in red, yellow and pink. (a) The CG17691 complementary DNA signal (red) maps to the proximal edge of region h44 of 2Rh. (b) The 19.74.3 P-element insert maps to h40–h41 of 2Rh. (c) The hybridization signal of BACR04P15 is located in h44.

Table 1 Known heterochromatic genes and gene models on chromosome 2 and their functions

Studying the function of heterochromatic predicted genes by RNA interference-mediated inactivation

RNA interference (RNAi) in Drosophila cell cultures has proven to be an extremely powerful method for disrupting the expression of genes and for studying their functions (Somma et al, 2002). Phenotypic analyses of RNAi cells can thus be useful for studying the function of heterochromatic predicted genes that lack mutant alleles (Figure 2a). In collaboration with DHGP, we have undertaken RNAi studies of about 100 predicted genes that have recently been mapped to the heterochromatin of chromosome 2 (Hoskins et al, 2002; Corradini et al, 2003; Yasuhara et al, 2003; Rossi et al, submitted). Clearly, the phenotypic analyses of RNAi cells define only those genes required at different steps of cell division and behaviour (such as cytokinesis, chromosome segregation, chromosome condensation, DNA repair and metabolism and cell–cell interactions) that are likely to give rise to obvious cytological defects.

Figure 2
figure 2

Knocking down heterochromatic gene function by RNAi. (a) RNAi in S2 cells. At 72 h after double-stranded RNA (dsRNA) injection, cells are fixed with methanol:acetic acid and chromosomes are prepared by air drying. Chromosomes of control and treated cells are stained by DAPI, and cytological analysis is performed to reveal apparent mitotic defects. At the bottom of the figure, an aberrant chromosome condensation morphology after RNAi of a 2Rh gene model is shown (right panel). This aberrant phenotype clearly differs from that of the nontreated control cells (left panel). (b) In vivo RNAi using dsRNA-producing transgenes coupled to the GAL4/UAS expression system. In symmetrical transcription, dsRNA synthesis can be induced by cloning, as in the SympUAST vector, a single gene fragment between two convergent arrays of the GAL4-responsive UAS sequences. This strategy permits the use of a long DNA fragment and results in a very stable plasmid both in the bacterial host during the cloning procedure and in the Drosophila genome after transgenesis. dsRNA can also be efficiently produced in vivo by mono-directional transcription of inverted repeats (IR) of a given DNA fragment under the control of a single UAS regulatory region. In this case, the inverted repeats should be separated by a short DNA spacer (100–200 bp) to avoid any rearrangement due to the recombinogenic potential of contiguous IR in the bacterial host and Drosophila genome. The synthesized dsRNA is then processed by the Dicer RNase, which is a component of the RNAi silencing complex (RISC), located in the cytoplasm. The Dicer RNase cleaves the exported dsRNA into smaller, 21-nucleotide small inhibitory RNAs (siRNAs). The siRNAs are used by the RISC complex as a template for destroying the homologous mRNA (in white). Through this mechanism, specific messages can be degraded in a cell-autonomous manner and in specific tissues according to the expression pattern of the GAL4 enhancer trap line used.

RNAi inactivation of heterochromatic genes can also be performed in vivo with suitable vectors. One such vector is SympUAST, which was designed to generate simultaneous transcription of both strands of a given DNA insert by means of two flanking convergent arrays of the Gal4-responsive UAS regulatory elements (Figure 2b; Giordano et al, 2002). The advantage of this vector is that it can accommodate DNA fragments longer than those used in vectors that transcribe inverted repeats and at the same time ensure high stability in the genome. In turn, there will be stability at the phenotypic level owing to the absence of recombinogenic inverted repeats. These transgenes are able to repress gene activity in transformed adult flies. An advantage of this procedure is that, according to the expression pattern of the Gal4 line used as a driver, RNAi can be induced ubiquitously or in selected tissues at specific developmental stages, and that the silencing effect is cell-autonomous.

Structure, function and evolution of genes located in heterochromatin

Organization and function of heterochromatic genes

A difference between heterochromatic and euchromatic genes is likely to be due to their molecular structure. Genes located in the heterochromatin of D. melanogaster are on average significantly larger than euchromatic genes owing to the occurrence of long introns enriched in transposable element-related DNA sequences (Devlin et al, 1990a, 1990b; Tulin et al, 2002; Dimitri et al, 2003). The example of the Y-chromosome fertility factors of D. melanogaster is marked as they contain up to 4 Mb of DNA (Gatti and Pimpinelli, 1992) and carry transposable-element-rich mega-introns that can account for 1 or 2 Mb (Kurek et al, 2000; Carvalho et al, 2001). Although some of the predicted genes mapped to the heterochromatin of chromosomes 2 and 3 are also large, this does not seem to be a general rule. Several hypotheses can be used to explain these observations. First, during evolution, older genes in heterochromatin might have increased in size by being targets for transposable-element insertions. In this case, it would follow that short predicted genes in heterochromatin were younger, having recently ‘moved’ to heterochromatin. Second, genes in heterochromatin might be targeted differently by transposable elements, with some genes being more refractory than others. Third, there might be selective pressure to maintain some genes of short size in heterochromatin owing to particular functional properties. Highly expressed genes tend to have substantially shorter introns than those expressed at low levels (Castillo-Davis et al, 2002). In this regard, it is worth noting that although RpL38, RpL5 and RpL15 – three highly expressed ribosomal protein-coding genes – are located in heterochromatin, they are all short and carry short introns (Marygold et al, 2005; Schulze et al, 2005).

The expression of heterochromatic essential genes such as light, rolled, RpL5, RpL38, RpL15, Nipped-B and Parp is present throughout all developmental stages (Biggs et al, 1994; Rollins et al, 1999; Tulin et al, 2002; Marygold et al, 2005; Schulze et al, 2005). The same is true for fourteen predicted genes from chromosome 2 heterochromatin that we have recently analysed (F Rossi, N Corradini, R Moschetti, R Caizzi and P Dimitri, unpublished data). The Y-chromosome fertility factors are an exception, as they have tissue- and sex-specific limited expression. In general, on the basis of molecular and bioinformatic analyses, predicted genes and known genes found in heterochromatin do not have molecular functions that would distinguish them from genes located in euchromatin (Hoskins et al, 2002; FlyBase, 2006). In other words, heterochromatin does not seem to have a distinctive proteome.

Gene expression in heterochromatic domains

The presence of expressed genes in the heterochromatin of evolutionarily distant organisms seems paradoxical: how can protein-coding genes work properly in an environment that is thought to be incompatible with gene expression? Genes located in constitutive heterochromatin might have regulatory requirements that are different from those in euchromatin. In fact, heterochromatic genes are repressed when moved to euchromatin by chromosomal rearrangements, indicating that they are dependent on a native heterochromatic location for correct expression (Wakimoto and Hearn, 1990; Eberl et al, 1993; see below). Therefore, these genes cannot simply be considered as euchromatin-like active sequences embedded in a repetitive genomic compartment. It would be more correct to consider euchromatin and constitutive heterochromatin as two different chromatin regions, in both of which gene expression can occur, possibly depending on the formation of differential multiprotein complexes. Chromosomal proteins required for the establishment of the heterochromatic state, such as heterochromatin protein 1 (HP1) and others, might also be involved in the control of gene expression in heterochromatin (Weiler and Wakimoto, 1995). Experimental evidence consistent with this proposal has been reported in D. melanogaster. First, genetic experiments suggested that modifiers of position effect variegation, such as suppressor of variegation (Su(var)) gene products, can interact to guarantee the proper expression of the light gene in its normal heterochromatic location (Clegg et al, 1998). Second, the amount of mRNA of light and rolled heterochromatic genes was found to be reduced about 2.5-fold in HP1 mutant larvae (Lu et al, 2000). To obtain more insight into the roles of HP1 and SU(VAR)3–9 – the gene encoding a histone methyltransferase – large-scale mapping of their target genes was recently carried out in Drosophila embryonic Kc cells (Greil et al, 2003). The results revealed that HP1 and SU(VAR)3–9 bind together to genes and transposable elements in constitutive heterochromatin. More recently, whole-genome and computational approaches were used to study HP1 targeting of genomic sites (de Wit et al, 2005). The HP1 protein was found to bind to the heterochromatic rolled gene, and the binding is prominent in both unique and repetitive portions of the genomic region of the gene. Together, these results corroborate the view that HP1, SU(VAR)3–9 and possibly other protein complexes are required for the proper functioning of genes resident in heterochromatin. It is tempting to speculate that such protein complexes, possibly by recognizing specific patterns of histone modifications, might be involved in the insulation of gene expression in constitutive heterochromatin.

One of the peculiar features of the heterochromatic DNA is its late replication in S phase, unlike the early replication of the bulk of euchromatin sequences. One could speculate that such a dichotomy might also occur at the transcriptional level: for example, heterochromatic and euchromatic genes might differ in their timing of expression during the cell cycle. Notably, cell-cycle-dependent changes in the chromosomal localization of heterochromatin-binding proteins have been observed in Drosophila (Platero et al, 1998), which might be related to the differential expression of heterochromatic genes.

The role of RNAi in the formation of heterochromatin in evolutionarily distant organisms has been documented and discussed at length (Hall et al, 2002; Reinhart and Bartel, 2002; Volpe et al, 2002; Lehnertz et al, 2003; Fukagawa et al, 2004; Pal-Bhadra et al, 2004; Sun et al, 2004; Verdel et al, 2004; Bernstein and Allis, 2005). However, it is still unclear whether RNAi is also involved in regulating the activity of genes located in constitutive heterochromatin. Studying the effects of mutations in the RNAi genes, such as piwi and argonaute, on the transcription of light and rolled genes might help to define the functional relevance of RNAi machinery for heterochromatic gene expression.

D. melanogaster heterochromatic genes in other species

Recently, Yasuhara et al (2005) studied a group of genes of 2L heterochromatin, including the light gene, and found that they are euchromatic in both D. pseudoobscura and D. virilis. In particular, 7 out of 11 genes found in D. melanogaster 2L heterochromatin have orthologues in D. pseudoobscura that are clustered on the corresponding chromosome arm. The authors propose that, ancestrally, light and its neighbouring genes were typical euchromatic genes found proximally in the euchromatin; alternatively, they might have been relocated closer to the centromere by chromosome rearrangement. Subsequently, an infiltration of pericentromeric heterochromatin into proximal euchromatin might have occurred. Interestingly, Yasuhara et al (2005) also found that the promoters of the heterochromatic genes they studied did not differ from euchromatic promoters. Thus, evolutionary adaptation to heterochromatin has been achieved without changing the basic promoter type. At present, it is unclear whether the expression pattern of the euchromatic light gene in D. pseudoobscura and D. virilis is different from that of its heterochromatic orthologue in D. melanogaster. These results, together with the finding that orthologous of other essential heterochromatic genes of D. melanogaster, such as rolled, Nipped-A, Nipped-B, RpL15, RpL38 and Parp are euchromatic in different species, including yeast, mouse, humans (Biggs et al, 1994; Krantz et al, 2004; Tonkin et al, 2004; Schulze et al, 2006; Rossi et al, submitted), indicate that during evolution the conservation of a heterochromatic location may not be crucial for the proper expression of a given gene.

An interesting approach for identifying the genomic location of the orthologues of D. melanogaster heterochromatic genes in other species was developed by the DHGP and is based on the analysis of the repeat content of orthologous introns and scaffolds (Smith et al, 2005). This approach should allow one to estimate whether genes have moved into or out of heterochromatin regions in other species and might provide new insight into the evolution of constitutive heterochromatin.

During evolution, transposable elements might have contributed to the plasticity of heterochromatin by stimulating chromosome rearrangements or gene transfer into this peculiar genomic compartment. Once in heterochromatin, genes would have been targeted by recurrent insertions of transposable elements, thus generating the present structural organization. Transposable element-related sequences in heterochromatin might thus contribute, in ways that are still poorly understood, to many of the structural and functional properties of heterochromatin, including the regulatory evolution of these genes (Pimpinelli et al, 1995; Weiler and Wakimoto, 1995; Dimitri, 1997; Dimitri and Junakovic, 1999; Yasuhara et al, 2005), similar to documented instances found in euchromatin (Von Stenberg et al, 1992; Miller et al, 1999; Kidwell and Lish, 2000; Jordan et al, 2003; Brandt et al, 2005; Kapitonov and Jurka, 2005). Over evolutionary time, transposable elements can thus be viewed as ‘artisans’ that have shaped heterochromatin, rather than merely as parasitic sequences (Dimitri et al, 2005b). Suggestive models on the evolutionary origin of heterochromatic genes in different species have been recently proposed by Yasuhara and Wakimoto (2006).

Drosophila heterochromatic genes related to human disease genes

The use of D. melanogaster for the study of human gene functions is now well documented. A systematic search for human disease-causing genes in Drosophila showed that about 75% of human disease genes match unique Drosophila sequences (Reiter et al, 2001). Interestingly, about 70% of the 101 putative proteins encoded by the heterochromatic predicted genes annotated on chromosome 2 have strong similarity to human proteins (Table 1). Some of the orthologous genes in humans are involved in genetic diseases. For example, NIPBL, the human homologue of Nipped-B, maps to euchromatin of chromosome 5p13 and is widely expressed in fetal and adult tissues. Mutations in NIPBL are responsible for the Cornelia de Lange syndrome, a multiple malformation disorder (Krantz et al, 2004; Tonkin et al, 2004). The predicted gene CG17528 encodes a putative microtubule-binding protein, which contains different, highly conserved functional domains: two tandemly repeated doublecortin (DCX) domains, which are characteristic of some microtubule-binding proteins, and a carboxy-terminal serine/threonine kinase domain. The human orthologues of CG17528, DCX, DCKL1 and DCKL2, might be implicated in lissencephaly, a genetic disorder characterized by severe mental retardation. Another interesting example is that of CG40218. The CG40218 protein belongs to the evolutionarily conserved family of bucentaur (BCNT)-like proteins found in several animals and plants (A. thaliana, O. sativa, Neurospora, S. cerevisiae, Caenorhabditis elegans, mosquitoes, flies, mice and humans). Little is known about the functions of the BCNT-like family. For example, craniofacial development protein 1 (CFDP1), the human orthologue of CG40218, encodes a 299 aa protein phosphorylated by casein kinase II, and whose function is still unknown. Intriguingly, this gene maps to chromosome 16 in 16q22.2–q22.3, in proximity to several loci associated with inherited craniofacial diseases such as Fanconi anaemia type A (Diekwisch et al, 1999).

Conclusions

Although some interesting features are beginning to emerge from structural and functional studies on genes located in heterochromatin, much still remains to be learned. In particular, we need to (1) link the sequences of genes in heterochromatin to its functions; (2) identify the factors that determine correct expression of those genes and (3) understand their evolutionary dynamics.