In bacterial genomes, functionally related genes are often clustered and controlled as a unit. Such 'operons' are not normally found in animals — so why are they so abundant in one class of worm?
The genomes of animals, plants and fungi seem to be relatively disorganized. Genes appear to be randomly distributed, with only a few exceptions: repeats of similar sequences caused by gene duplications, for example, and a limited number of ancient gene clusters containing functionally related genes (such as the Hox genes that are involved in control of animal development1). Apart from these, the average gene is generally assumed to be independent of its neighbours, and genomes are constantly rearranged and shuffled. However, in one group of animals — the nematodes (small, unsegmented worms) — neighbouring genes are occasionally assembled into regulatory units called operons2. On page 851 of this issue, Blumenthal et al.3 now report the first whole-genome characterization of such operons in a multicellular organism, and raise intriguing questions as to how (and why) they have evolved.
Operons were first described4 in prokaryotes — small, single-celled organisms without a nucleus. In these organisms, genes needed for one particular process (say, synthesis of an amino acid) are often clustered in close proximity on the genome, with the same orientation on the DNA. These operons facilitate the coordinated regulation of genes, as the clustered genes are activated (that is, transcribed into messenger RNA) all in one go. Operon transcripts always code for more than one protein, and prokaryotes handle this by starting translation of the messenger RNA into amino acids separately at the beginning of each protein-coding section (Fig. 1a). In contrast, these 'polycistronic' transcripts are a problem for eukaryotes (organisms with nucleated cells, including all multicellular organisms). They generally cannot process such transcripts, and so it has long been assumed that most eukaryotes do not have operons.
Blumenthal et al., however, had shown previously2 that the nematode Caenorhabditis elegans occasionally does make polycistronic transcripts and can process them through a mechanism called trans-splicing. Briefly, the transcripts are split up into individual pieces, and short 'leader' sequences are simultaneously added to each piece (Fig. 1b). In C. elegans, trans-splicing is not limited to polycistronic transcripts — in fact, the majority of transcripts receive leader sequences. But the polycistronic transcripts are unique in that processing involves a particular splice leader, called SL2, for all pieces but the first (which often has the usual leader, SL1).
To measure the extent of polycistronic transcription (and so of operons) in C. elegans, Blumenthal et al.3 initiated a systematic, genome-wide study. Reasoning that it should be possible to discover all operons by searching for SL2 transcripts and checking their arrangements on the genome, they took two independent approaches. First, they analysed C. elegans transcripts that had been sequenced previously (random transcripts are routinely sequenced in many model organisms). Among these, they found more than 300 that contained SL2. For their second approach, they prepared a collection of actual transcripts from a population of worms, and enriched the sample for transcripts with an SL2 leader (this involved adding synthetic SL2-like molecules to the mixture to find the right transcripts through base-pairing). The enriched population was then compared to a normal, non-enriched fraction by measuring transcript abundances. This technically demanding approach yielded another 1,200 SL2-transcripts. Both sets of data converged well and agreed with gene clustering as observed in the genome. Together with the unlabelled first gene of each operon, the combined list contains 2,291 genes in 881 operons. This is a surprisingly large total — the authors estimate that 13–15% of all genes in C. elegans (including some they might have missed) are expressed as part of an operon.
What can be learned from this first genome-wide study of eukaryotic operons? First, it nicely confirms a previous assumption2: in C. elegans, SL2-containing transcripts are indeed reliable operon markers (as expected, in many runs of consecutive genes on the genome, all genes but the first were found to generate SL2-transcripts). Second, the data may help researchers working with an uncharacterized gene — if it is in an operon with a gene that is better characterized, they can probably assume that both have the same transcriptional regulation. Third, a genome-wide view of operons in C. elegans is a step towards addressing long-standing questions such as why nematodes have operons and whether they are the only animals that do.
Blumenthal et al. cautiously suggest that some of the operons could serve the same purpose as their counterparts in prokaryotes: to group functionally related genes together. This clearly appears to be true for some genes3, and it would be exciting if it were the case for most of them — it would mean that the function of uncharacterized genes could be predicted from their operon context. We could not resist checking the functional annotations of these newly discovered operons (using data from the Gene Ontology Consortium5), but we found that only about 4% of the operons contained two or more genes annotated for the same biological process, compared to 36% for known and putative operons in the prokaryote Escherichia coli. This figure is low, but it could well be due to the fact that the fraction of genes with known function is smaller in C. elegans than in prokaryotes. Because of these limits in annotation, we cannot yet say whether most genes in operons are functionally related — but there are certainly cases where they are not.
To understand the functional significance of operons in C. elegans, one must consider their evolutionary history. Are they related to prokaryotic operons through common descent? Trans-splicing, at least, is not unique to nematodes. It was first described in single-celled eukaryotes (trypanosomes, in which entire operons have also been reported6), and more recently in several animals other than nematodes7,8. This raises the possibility that the earliest multicellular animals possessed trans-splicing capability, and possibly had operons inherited from prokaryotes. However, in addition to mechanistic differences (Fig. 1), the content of nematode operons seems distinct from that of prokaryotes. Not only is the average number of genes per operon smaller (2.6 in C. elegans against 4.1 in E. coli), but we also found only a few cases where two or more genes were in an operon together in both C. elegans and prokaryotes — hardly more than would be expected by chance (using the database Clusters of Orthologous Groups9). Furthermore, no operons have yet been found in well-studied groups such as insects, fishes and mammals, nor in plants or fungi.
From all of this it seems that operons in C. elegans were a lineage-specific invention, perhaps facilitated by an existing capability for trans-splicing. A possible evolutionary scenario is suggested by the finding that genes with similar transcription profiles may have a tendency to cluster in eukaryotic genomes10,11. It is conceivable that such clusters were among the first to be joined into operons. This might have been promoted by a selective pressure for a small genome (C. elegans is known for its compact genome). Later, in the continuing processes of genome shuffling, segmental duplications and gene loss, more transcriptionally or functionally related genes could have come close together by chance, forming additional operons.
Whatever the precise course of events, Blumenthal et al.3 have provided us with a key study. The operons they have identified will be the basis for much further investigation into genome evolution and gene regulation in animals.
About this article
Nature Genetics (2002)