Sir

Is all that junk really regulatory RNA?

Mattick argues that increases in eukaryote complexity are primarily due to the emergence of RNA-based regulation and signalling1, an argument consistent with a growing body of literature23. However, several of his points are misleading and therefore his thesis clashes unnecessarily with other supported theories.

Figure 1 of Mattick's article1, reproduced from a preprint4, claims to show that the fraction of noncoding DNA (ncDNA) increases as a function of developmental complexity. This graph tells us no such thing; organisms are simply ordered by the fraction of ncDNA they contain. Therefore, concluding that “the ratio of noncoding to protein-coding DNA rises as a function of developmental complexity”1 (my emphasis) is misleading — without some measure of developmental complexity one cannot claim this. The motivation for the graph is hinted at1 (and elsewhere argued explicitly45) — that this ncDNA mostly codes regulatory RNA, and Mattick proposes that “the principal advance in complex organisms was the development of a digital programming system based on ncRNA signalling”1.

Figure 1: Gene number and fraction of noncoding DNA in prokaryotes and eukaryotes.
figure 1

Organisms ordered as per FIG. 1 in Mattick1, with several newly sequenced genomes added. Bars represent gene number. Purple bars, Archaea; green bars, Bacteria; beige bars, Eukaryotes. Red circles represent ncDNA content, as a fraction of total haploid genomic DNA content, modified from Taft & Mattick4.

However, it is not straightforward to compare the ncDNA contents of prokaryotes and eukaryotes as their genomes are under different constraints6. That gene number and genome size are correlated in prokaryotes7 indicates selection, perhaps owing to architectural constraints on replication rate6. Exceptions to the correlation are obligate intracellular pathogens undergoing gene loss, such as Rickettsia prowazekii and Mycobacterium leprae. In Mattick's figure, R. prowazekii, which has 24% ncDNA8, appears as the most complex prokaryote, just 'below' eukaryotes1. If M. leprae were included (50.5% ncDNA9), it would be more 'complex' than Plasmodium and yeasts (see figure). As Mattick points out, prokaryote genes usually code for single proteins15. Therefore, gene number might correlate with complexity. Mapping ncDNA content against ORF number (see figure), reveals that a 'correlation' between developmental complexity and ncDNA requires ncDNA in prokaryotes to be a clearer indicator of complexity than gene number. Mattick would presumably agree that R. prowazekii, with ˜830 ORFs, is not more complex than Pirellula, with 5% ncDNA and ˜7,300 ORFs. His graph therefore compares apples with pears — there is no evidence that ncDNA content correlates with complexity in prokaryotes, so the analysis should be restricted to eukaryotes.

However, even this is not without difficulty. Another outcome of reductive evolution is genome compaction, which is a feature of several eukaryote genomes10. Furthermore, there is compelling evidence that many unicellular eukaryote genomes are losing introns11. Conversely, complex eukaryotes might be accumulating junk, perhaps as a consequence of reduced population size in larger species12. This is not at odds with the emergence of RNA-based regulation in higher eukaryotes; it simply points out that such regulation would have emerged subsequently, with natural selection acting upon the 'genomic junkyard'. So, different eukaryote genomes are likely to be 'behaving' differently, some becoming compact, some accumulating 'junk'.

Mattick also argues that intronic and intergenic sequences in, for instance, pufferfish “cannot be easily dismissed as junk, as [this fraction] largely comprises DNA sequences of high complexity...”1. This is misleading. Low-complexity sequences are usually filtered out in BLAST searches because it is difficult to infer homology from sequence similarity for such sequences13. Sequences of both high and low complexity can be functional, yet there is no indication that high-complexity sequences belie function. By contrast, low complexity sequences are far from random and their existence requires explanation14, whether they are functional or not.

In some respects, Mattick's paper1 reads like a late attempt to resolve the C-value paradox. He provides a good case for the importance of RNA-based regulation in the development of complex multicellular organisms, but there is no umbrella explanation for genome-size variation15. Numerous transposons, repeat elements and proviral sequences exist in eukaryote genomes16. Occasionally, such elements have acquired novel functions16, but several independent theories predict that a portion of the genomes of higher eukaryotes will be junk and superfluous for development.

One well-established theory relates to the fitness of transposons in sexually outcrossing populations17. In sexual outcrossers (which include most complex eukaryotes), autonomously replicating elements are predicted to spread despite their cost to the host genome (if host fitness reduction is <0.5). Such elements contribute to a significant fraction of the genomic junkyard and do not have to be selectively advantageous to the host (although they can become so; for example, recruitment of RAG transposases in the evolution of antibody domain shuffling in vertebrates16). Conversely, asexual reproduction limits transposon spread and (in the absence of other mechanisms of horizontal transfer) transposons are predicted to die out because transposon and host genome fitnesses are identical, so transposons are affected by reduction in host fitness17. Bdelloid rotifers, which have lost sex, have also lost all active retrotransposons in the process18.

To conclude, it is misleading to argue that most ncDNA in complex eukaryotes is regulatory RNA (even if much of the genome is transcribed) and that there is no junk in these genomes, as Mattick implies. This puts his hypothesis at odds with well-supported theory and experiment with which it need not conflict. Even if only 10% of the human genome turns out to encode small regulatory RNAs, this would still be strong support for Mattick's theory, and there are already signs that it will be borne out2.