Main

The read-out of genetic information in eukaryotes is not only more complex than previously realized, but also more dynamic. Much of the information is 'soft-wired', with extensive RNA processing necessary to generate phenotypes10. A subset of the RNA, referred to here as coRNA (Fig. 1), coordinates the read-out of genetic information and reconciles regulatory inputs. For each cell type, the sum of these interactions defines a distinct RNA profile or ribotype10. RNA-based processes also facilitate replication of ribotypes in subsequent generations. The flexibility of these programs allows soft-wired organisms to evolve in a more rapid and dynamic way than 'hard-wired' organisms constrained by DNA mutation10. Together, the actions of reading, 'riting, 'rithmetic and replication constitute the four Rs of RNA-directed evolution. Here, we describe roles for coRNAs and SATs (Fig. 2) in these events.

Figure 1: coRNAs (shown in red) have a central role in RNA and DNA read-out.
figure 1

For example, miRNAs can act through the RNA interference pathway or by inhibiting translation of RNA. Similarly, coRNAs affect DNA read-out through transcriptional silencing pathways or by modulating transcription. They are also essential for the rewriting of RNA during editing and splicing, generating new nucleotide sequences not found in the gene. Together, these processes allow information from RNA-processing events to be reconciled with other regulatory inputs, including those from the environment, and to affect both the short-term (through epigenetic mechanisms) and long-term storage (through reverse transcription) of outcomes in DNA.

Figure 2: Use of SATs to regulate RNA expression and processing.
figure 2

(a) Transcription from both transcriptional units initiates a signaling cascade that results in gene silencing (indicated by dotted lines for transcripts and promoters): the transcriptional units effectively cancel each other out. In this example, net transcription of one gene is achieved by using a promoter outside the affected region (both the transcript and the promoter are shown with solid lines). ssRNA, single-stranded RNA. (b) Switching one gene off and another gene on by using a bidirectional promoter. Silencing of the upstream gene results from the antisense transcript generated from the promoter. At the same time, transcription of the downstream gene begins. (c) A switch to a shorter RNA splice isoform by using an antisense transcript to interfere with production of the longer isoform. Transcripts are shown in red and the DNA strands are shown as parallel, solid horizontal black lines, with direction of transcription being 5′ to 3′.

RNA-directed RNA readout

RNA coregulates the sequence-specific read-out of genetic information from RNA by targeting the evolutionarily conserved machinery of RNA interference (RNAi)11 and translational repression. RNAi leading to post-transcriptional gene silencing (PTGS) is induced by double-stranded RNA (dsRNA) arising from infection by dsRNA viruses, sense-antisense transcription pairs, transcription of inverted repeats and delivery of dsRNAs manufactured in vitro12. Both endogenous and exogenous dsRNAs are processed cytoplasmically by RNase III dicer paralogs into small interfering RNAs (siRNAs) that are 21–25 bases long12,13. These siRNAs are incorporated into an RNA-induced silencing complex (RISC) that targets any RNA with a sequence complementary to the siRNA12,14. The target RNA is cleaved by RISC, which has endonucleolytic and presumed exonucleolytic activity12,15. Each cycle results in the destruction of the target RNA and the regeneration of an active RISC16,17.

In Neurospora crassa18,19, plants20, Schizosaccharomyces pombe21 and Caenorhabditis elegans22, but not in flies, humans or mouse oocyctes18,19,23,24, PTGS can also be initiated by an RNA-dependent RNA polymerase (RdRP) that synthesizes complementary RNA from highly expressed transcripts to generate dsRNA. This process can spread from the 3′ to the 5′ end of the mRNA, leading to progressive silencing of a gene and other mRNAs with exons in common (transitive RNAi)18,22. From the evolutionary viewpoint, transitive RNAi favors the generation of new phenotypes, as related genes in species with polyploid genomes will escape cosuppression by sequence divergence, resulting in altered function or expression.

The number of dicer paralogs also differs between species. Arabidopsis, which has at least four dicer family members, seems to use different enzymes to defend against viruses and to process endogenously produced dsRNA13,25. In humans, mice and worms, there is only one dicer ortholog, suggesting that any pathway-specific functions are guided by ancillary regulatory proteins associated with RISC. These ancillary proteins include members of the argonaute family, some of which are involved in cell fate determination12,26. Argonaute proteins may therefore target RISC to specific pathways during development, suggesting an important role for RNA-dependent processes in specifying phenotypes, and thus in the evolution of these phenotypes.

RNA can also induce PTGS by repressing mRNA translation. This mechanism was first demonstrated in C. elegans, in which the short non–protein coding RNAs lin-4 and let-7 are used to time development programs by regulating the stage at which lin-14, lin-28 and lin-41 messages are translated27. These RNAs are microRNAs (miRNAs)27,28,29,30, which are encoded genomically as short inverted repeats that have a dsRNA stem of about 70 bp. They are found in introns as well as in intergenic clusters29,31. The precursors are processed by dicer and other RNase III family members to produce effectors of 21–25 nucleotides derived usually from one strand of the stem32. miRNAs are defined by conservation between species; there are an estimated 200–255 miRNAs in humans30.

Some miRNAs are capable of targeting RNAs to the RNAi pathway16. Other miRNAs are predicted to bind conserved elements in mRNA important for translational control, such as the K box, Brd box and GY box motifs33. A role for trans-acting RNAs in these pathways had not been previously recognized. A wide variety of miRNA targets have been identified, mostly in plants, where perfect sequence matches between miRNAs and their targets are more common than in other species27. The tissue-specific expression of miRNAs in other species suggests that they have roles in development analogous to those found in C. elegans34,35.

RNA-directed DNA read-out

The read-out of genetic information from DNA can be silenced by processes that generate dsRNA, including SATs and transcription of inverted repeats. The activity of transcriptional complexes can also be modulated by coRNAs either serving as scaffolds or acting as allosteric effectors.

Transcriptional gene silencing (TGS) can be induced in trans in plants by dsRNA homologous to a promoter and is associated with DNA methylation in the region of sequence overlap36. The involvement of RISC in this pathway is suggested by the requirement for argonaute family members. For example, in Arabidopsis, mutant argonaute 4 protein diminishes methylation of the gene SUPERMAN, which defines boundaries between carpels and stamens13,37.

More commonly, RNA-directed DNA silencing uses elements encoded in cis. Direct evidence has been obtained in S. pombe for the involvement of the RNAi pathway in establishing the silenced state38,39. Silencing is associated with bidirectional transcription of a repeat element in heterochromatin that generates a dsRNA substrate and is dependent on the S. pombe orthologs of dicer, argonaute and RdRP38,39. The pathway initiates a signaling cascade that produces methylation of Lys9 on histone H3. Spread of silencing to adjacent chromosomal regions depends on the heterochromatin protein 1 ortholog swi6 (refs. 21,38), a member of the chromodomain protein family that recognizes both histone H3 Lys9 and RNA40. A related mechanism seems to silence polycomb response elements (PREs) in Drosophila melanogaster; studies using chromatin crosslinking and immunoprecipitation showed that promoters in these elements are engaged with the transcriptional machinery. These promoters have a low level of transcriptional activity, give rise to short dsRNA products41,42 and produce methylation of histone H3 at Lys27 (ref. 43). Involvement of RISC in the silencing of PREs is supported by its dependence on piwi, an argonaute family member42.

Similar mechanisms may control transcription from retroelements44. For example, the methylation status of the agouti Avy allele, which contains an antisense retroviral IAP insertion45,46, determines its effect on phenotype44. Methylation occurs stochastically, producing phenotypic variation at the level of the individual, the tissue or the cell. For example, methylation of only a subset of cells produces mottled mice with both yellow and agouti coloration. Variable methylation of other retroelements may have a similar impact on many phenotypes44. Such alleles, whose phenotypic expression depends on epigenetic modification, have been referred to as epialleles47.

TGS can potentially be initiated by dsRNA formed from pairs of transcriptional units arranged in a tail-to-tail orientation (SATs). In humans, SATs account for most overlapping transcriptional units (70%; refs. 5,48). They are far more frequent than previously anticipated5,48. A recent computational survey estimated that there are 1,600 human SATs (3,200 transcriptional units)5. Similar results were obtained from an analysis of the mouse FANTOM2 clone set. A total of 2,481 sense-antisense pairs (4,962 transcriptional units) in which exons from both transcriptional units overlapped were identified. In 351 of these pairs, neither transcriptional unit seems to encode a protein product, whereas in 519 pairs, both are coding6. Another 899 pairs (1,798 transcriptional units) in which the overlap was between an exon and an intron were identified. The number of SATs is probably underestimated in these analyses due to our limited knowledge of the degree to which the 5′ and 3′ untranslated regions of adjacent genes overlap. Furthermore, the role of RNA polymerase III transcripts (106 in the human genome49) in SAT formation was not evaluated in these reports. Given the current estimate of 24,500 human protein-coding genes8, 10–20% of these seem to belong to SATs.

SATs offer many potential advantages for gene regulation (Fig. 2). When both transcriptional units are active, formation of dsRNA occurs by default, leading to histone modification and TGS: SATs effectively cancel each other, relaxing the need for stringent regulation of promoter engagement by transcription factors. An example is the imprinting of the gene Igf2r in mice50. Expression of Igfr2 depends on whether the locus is inherited from either the mother or the father. Expression from the paternal chromosome is prevented by an antisense transcript produced by the non–protein coding gene Air. Net expression of Igfr2 occurs only from the maternal chromosome, as methylation of the maternal Air gene promoter prevents production of an antisense transcript50. This mechanism might be quite general. A search for imprinted genes identified 159 differentially expressed loci with SATs51, and an examination of 58 known mouse imprinted genes found an antisense transcript for 22 of them6. Besides acting on themselves, SATs may cause differential expression of other loci by silencing genes in adjacent chromosomal regions.

SATs also have the potential to regulate silencing of adjacent chromosomal regions by switching one transcriptional unit off, allowing net transcription of the other. This process could be stochastic. Such an event may operate in the random silencing of paternal and maternal X chromosomes in mouse (but not human52) somatic tissue53. In mice, the expression of the X-inactivating Xist transcript and its antisense transcript Tsix are mutually exclusive53.

Switching to a promoter outside the region of silencing provides a different mechanism for producing net transcription from only one transcriptional unit in a SAT (Fig. 2a). Such a process may regulate expression of bithorax complex genes in flies54,55. Here, transcription through a silenced PRE from an intergenic promoter outside the region of silencing relieves suppression of the PRE and results in an active chromatin conformation. Other genes silenced by spreading of heterochromatin from the PRE also become transcriptionally active54,55. This outcome represents an epigenetic switch as the active chromosomal state is transmitted to daughter cells54,55. This mechanism also allows the coordinated switching of one gene on and another off (Fig. 2b). For example, a bidirectional promoter in the cardiac myosin locus that activates read-out of MYHC6 also generates an antisense transcript that switches off the upstream gene MYHC7 (ref. 56). Expression of MYHC6 is thus accompanied by silencing of MYHC7. Similarly, polyoma virus uses late transcripts to downregulate early ones57.

coRNAs act in other ways to regulate read-out of information from DNA. Many pathways involve trans-acting coRNAs that affect transcriptional initiation and elongation. Examples include the steroid receptor RNA activator58, dsRNA motifs bound by the transcriptional regulator NF110 (ref. 59), the U1 small nuclear RNA that associates with TFIIH60 and the 7SK RNA that inhibits Cdk9 (the RNA polymerase II C-terminal domain kinase)61,62. In these cases, coRNA seems to act as an allosteric effector. In other pathways, coRNAs may serve as scaffolds for the assembly of activation complexes. For example, the roX1 and roX2 transcripts in flies nucleate protein complexes that double the rate of X-chromosome transcription in XY males to match the quantity of X-chromosome transcripts present in XX females, achieving dosage compensation63. In addition to binding proteins, RNA scaffolds could bind other coRNAs in a sequence-specific manner. Furthermore, both scaffolds and allosteric coRNAs provide a means to switch SATs in trans64.

Candidate coRNAs have been identified by searching for non–protein coding RNAs other than transfer RNA and ribosomal RNA in the RIKEN mouse cDNA collection7. These transcripts constitute 7% of the library. Only 7.5% are antisense to another transcript. Some have orthologs in other species65. Such findings indicate that coRNAs may be quite numerous in soft-wired organisms.

RNA-directed rewriting of RNA

Rewriting of information in nascent RNA transcripts is a well characterized process: some nucleotides are deleted during splicing and others are changed by editing. Around 41–60% of mouse multiexon genes generate alternatively spliced transcripts4; the frequency of edited transcripts is unknown. These processes generate new sequences not found in the gene. Trypanosomes show the importance of RNA rewriting. Their survival depends on editing defective mitochondrial transcripts using trans-encoded RNA sequences to guide insertion and deletion of uridines66. The rewriting of RNA restores the correct reading frame, allowing the production of functional gene products. RNA guides are also used to direct rewriting of RNA during editing and splicing of pre-mRNA3,67. In some cases, editing targets splice sites and in others, splicing prevents editing67. Whether trans-acting coRNAs generate phenotypic diversity by regulating alternative splicing is currently unknown. But the potential to treat mendelian diseases by altering splicing patterns with transfected antisense molecules has been shown68.

Rewriting of RNA is also impacted by SATs. For example, an overlapping antisense transcript inhibits formation of the longer splice isoform from the thyroid receptor α gene (THRA; Fig. 2c)69. dsRNAs produced by SATs are also substrates for the ADAR family of editing enzymes70. These enzymes rewrite adenosine with inosine, destabilizing dsRNA67. This process downregulates RNAi induced by transgenes in C. elegans, probably by disrupting the formation of siRNA70. Similarly, dsRNA-activated TGS mechanisms may be inhibited by ADARs, resulting in the net expression of both transcriptional units in a SAT. This process may have other sequelae: RNAs with a large number of inosines are stable and selectively retained in the nucleus57. These modified transcripts, perhaps produced by a particular class of SATs, are proposed to participate in the organization of the nuclear matrix71.

Rewriting of RNA is associated with a high turnover of transcripts. Of all the RNA transcribed in human nuclei, only about 5% enters the cytoplasm72. Quality control mechanisms, such as nonsense-mediated decay, dispose of incompletely or improperly processed messages encoding flawed proteins73. They ensure that rewriting of information in RNA in soft-wired organisms yields functional ribotypes.

RNA-directed rewriting of DNA

Genomes can be rewritten, using reverse transcription to record elements of successful ribotypes and create an evolutionary scratchpad for new ones. Around 45% of the human genome is derived from retrotransposition1. These processes generate genetic variability (Table 1)1,2. Silencing of genes may result when insertion produces new SATs. Alternative splicing may develop when reverse transcription of an antisense RNA copies into a gene those sense-strand sequences that encode introns74. Species-specific promoters may evolve through spread of RNA scaffolds by reverse transcription.

Table 1 Genomic mechanisms for the generation of new coRNAs

RNA-directed rewriting of DNA also has an essential role in maintaining genome stability. Telomerase is a reverse transcriptase that uses an RNA guide to rewrite chromosomal ends and prevent their loss through fusion75. Other roles for RNA-directed rewriting of DNA have been discovered in tetrahymena. This organism has two nuclei: a germline micronucleus that is transmitted to progeny and a somatic macronucleus that is actively transcribed76,77,78. The macronucleus is created from the micronucleus. During this process, the deletion of DNA sequences is guided by homologous RNA transcripts from the micronucleus76,77,78. New sequences appearing in the macronucleus during one reproductive cycle are targeted for deletion in the next generation, ensuring, for example, that the genome is not destabilized by the spread of DNA transposons78. Similar types of processes may impact genome stability in other organisms. For example, centromeric repeats are deleted from S. pombe when the RNAi machinery is inactivated by mutation, suggesting that unrepressed bidirectional transcription, due to a lack of silencing, promotes loss of these elements38. One consequence of repeat deletion is failure to assemble and correctly segregate chromosomes during meiosis and mitosis79. Microdeletion in areas of SAT overlap, such as those observed in 70% of human Angelman and Prater-Willi syndromes80, may have a similar basis, as may chromosomal loss during oncogenesis. Deletion of retroelements from inverted repeats, such as those found in the human Y chromosome81, may be favored by the enhanced transcription and chromatin remodeling occurring during spermatogenesis82. Recombination between actively transcribed retroelements may also produce the small segmental DNA duplications that account for 5% of the human genome83. RNA-directed processes thus have important roles in rewriting and maintaining information stored in DNA. Besides influencing DNA read-out, SATs also shape the genomic architecture of soft-wired organisms.

Reconciliation of information

In each ribotype, there is a rule set to ensure that specific transcripts are produced and particular mRNAs translated (Fig. 1). These outcomes are achieved by using coRNAs to coordinate the action of highly conserved pathways84,85. An RNA product from one processing event may regulate a downstream event, making the second outcome contingent on the first10. For example, a miRNA encoded in an intron would only be expressed when the host gene was transcribed31,84,85. Alternatively, coRNAs may facilitate coordination of pathways by interacting with sequence motifs shared by a number of targets64.

Evolution of rule sets requires creation of new coRNAs (Table 1). New coRNAs could arise by duplication and mutation of older versions to generate new ones with different expression patterns or altered sequence specificity. Alternatively, coRNAs with different permutations of regulatory elements could be created de novo by template-switching during reverse transcription. This mechanism is similar to that proposed for retroviral recombination86. New coRNAs could also evolve from intron sequences or from genes that have lost protein-coding function. Many of these coRNAs, unlike miRNAs, will be species-specific.

In this scheme, coRNAs and mRNAs interact to modify the linear flow of information from DNA to protein84,85. They may evolve in parallel85 or in concert84. New phenotypes arise when newly generated coRNAs extract different subsets of information from the genome. This process exploits the inherent flexibility of the underlying RNA-based programs. It is not dependent on mutation of protein-coding genes in the way that hard-wired plans are. Rather, it depends on a coRNA-dependent change in the way RNAs are produced and processed10. In this view, the phenotypic diversity underlining the evolution of soft-wired organisms is generated through the creation and selection of new coRNAs.

These processes favor the rapid evolution of soft-wired organisms. Unlike the creation of new protein domains by exon shuffling, this mode of evolution is not constrained by the need to maintain reading frame74. Nor is it constrained by the selective pressures that have conserved promoter and enhancer sequences over the millions of years that separate fish and humans87. Instead, new coRNAs would lead to the assembly of new regulatory complexes on conserved DNA elements, new patterns of TGS and PTGS, altered processing or translation of transcripts and changes to patterns of tissue-specific gene expression during development. Exploration of new evolutionary spaces can occur without completely abandoning adaptations that were advantageous in the past: potentially, only the subset of information extracted from DNA need change during this process. The genome thus serves as a 'junkyard' of possibilities through which RNA-directed processes sift to create a selective advantage for their host10.

Replication of ribotypes

Transmission of successful ribotypes to subsequent generations is essential to any evolutionary program (Fig. 3). In soft-wired organisms, elements written from RNA to DNA during gametogenesis by reverse transcription are fixed in the nucleotide sequence of the genome, ensuring their communication to offspring. But epigenetic modifications, such as DNA and histone methylation, can also be relayed to descendents. As discussed above, many of these modifications probably arise through RNA-dependent processes and thus represent a means by which ribotypic images can be written to chromatin and carried from one generation to the next. For example, imprinting is determined by the parent of origin of a chromosome, indicating that at some point maternal and paternal chromosomes are marked so that they can be distinguished during embryogenesis. In other cases, a particular modification can be transmitted from either the father or the mother. The apparent nonmendelian transmission of the mouse axin fused epiallele (AxinFu) is one example of this process88. The AxinFu allele contains an intronic IAP retroelement insertion that produces a truncated transcript, leading to a kinked tail. Silencing of the IAP promoter by methylation restores a wild-type phenotype. The silent methylated state is transmitted to offspring in a strain-specific manner. For the 129P4Rr/Rk strain there is a 70% chance of transmitting the modification when present in the mother and a 40% chance when coming from the father88. There is no paternal transmission in C57BL/6J crosses88. Similarly, methylation of other retroviral regulatory elements may undergo variable erasure during primordial germ cell development89 producing 'compound epigenetic mosaic' individuals44. The persistence of such epigenetic marks is of relevance to the origin of complex diseases, which are characterized by the absence of mendelian inheritance90. Here, the susceptibility of offspring to disease can depend on whether there is maternal or paternal history of disease as well as ethnicity91.

Figure 3: Replication of RNA-based programs in the next generation.
figure 3

In soft-wired organisms, imprinting of parental chromosomes and the stochastic methylation of epialleles probably depend on RNA-based mechanisms. The development of the embryo also relies on maternal RNAs transferred to the ovum. These RNAs initiate the establishment of appropriate ribotypes during embryogenesis. Maternal RNAs present in plasma could also influence these events.

Transmission of ribotypes also occurs more directly. The embryo receives RNA from the mother that is important in specifying cell fate. Disruption of this process in flies by mutation of the maternal-effect genes enhancer of zeste and extra sex combs leads to the homeotic substitution of one segmental body plan for another by altering PRE methylation patterns92,93. By establishing chromosomal states in the embryo, maternal RNAs initiate read-out of appropriate ribotypes. Variations in this process may alter the temporal and spatial expression of developmental programs, leading to phenotypic variation between individuals and eventually to the heterochronic evolution of new body plans94.

The fetus is also exposed to the maternal environment, which can influence the fetal phenotype. For example, pregnant female mice fed a diet rich in methyl donors rather than a standard laboratory diet have litters with fewer yellow-colored agouti Avy offspring, reflecting enhanced silencing of the retroviral promoter in this allele46,95. In other cases, integration of signals received from maternal hormones may trigger epigenetic modifications that alter long-term phenotypic development by modulating RNA coregulatory networks96. Low birth weight, for example, has been shown to correlate with lifetime risk of cardiovascular disease and diabetes mellitus, presumably by resetting developmental programs97. Furthermore, the recent demonstration that plasma RNA is quite stable98 raises the question of whether coRNAs secreted by various somatic tissues are used to transmit information from mother to fetus. Epigenetic changes programmed before the germ line is segregated from the soma during gastrulation99 not only will affect fetal phenotype but also may undergo transmission to the next generation. Collectively, the above mechanisms allow the communication of successful maternal ribotypes to progeny.

Origins

The four Rs have always been part of the evolutionary process, their genetic coding changing with time and circumstance. Mechanisms in which information was copied from one nonrandom surface to another, allowing for their replication, emerged early. Depending on the fidelity of this process and the nature of the interaction, surfaces were rewritten and read-out in different ways, generating diversity and new functionality. Among the different surface combinations possible, those that catalyzed each other's formation were favored. The surfaces that best reconciled the prevailing physical and chemical constraints persisted, while those capable of adaptation prevailed. The surfaces may at times have been composed of clay, lipids, carbohydrates, polypeptides or polynucleotides of various descriptions and in different combinations, depending on such processes as noncovalent assembly, oligomerization and polymerization, but the emergence of life as we know it was marked by the arrival of the 'RNA world' (ref. 100). RNA was informational, acted as its own template and was capable of catalysis; it could promote its own perpetuation, either directly or by co-opting other partners along the way. The preferential replication of selfish RNA has defined the evolutionary space for soft-wired organisms ever since2,9. In humans, we find that much of the DNA genome has arisen from retrotransposition and that RNA-directed processes impact the read-out of RNA from genes and also rewrite their RNA transcripts. We find that non–protein coding RNAs are central to the translation of coding RNAs and to the coregulation of other cellular events. Through combinatorial processes that lead to new coRNAs, these events can vary over time. New RNA spaces can be explored in search of those that create a selective advantage for their host. In this view, the genome of soft-wired organisms is co-opted as a canvas for the creation, storage and dissemination of successful, but ultimately selfish, RNAs.