Protein fossils live on as RNA

Sasidharan, Rajkumar; Gerstein, Mark

doi:10.1038/453729a

News & Views
Published: 04 June 2008

Genomics

Protein fossils live on as RNA

Rajkumar Sasidharan¹ &
Mark Gerstein¹

Nature volume 453, pages 729–731 (2008)Cite this article

2851 Accesses
75 Citations
5 Altmetric
Metrics details

A Correction to this article was published on 18 June 2008

This article has been updated

Pseudogenes constitute many of the non-coding DNA sequences that make up large parts of genomes. Once considered merely protein fossils, it now emerges that some of them have active regulatory roles.

You have full access to this article via your institution.

Download PDF

A central challenge in genome annotation is determining the function of sequences that do not encode proteins, but make up the overwhelming bulk of large genomes — some 99% in humans. A significant fraction of these sequences are pseudogenes, or fossils of ancient proteins, and although many of them are transcribed into RNA, they have hitherto been deemed 'junk'. However, given the abundance of pseudogenes, it is unlikely that they are useless. One function suggested for them is gene regulation, and RNA interference (RNAi) has been proposed as the mechanism for carrying this out. Six papers^1,2,3,4,5,6, including three in this issue (pages 793, 798 and 803), significantly expand the known scope of RNAi by describing the discovery of natural small interfering RNA (siRNA) sequences in mice and fruitflies, some of which are potentially transcribed from pseudogenes.

The textbook definition of a pseudogene is an inheritable genetic element that is similar to a functioning gene, yet is non-functional. But what is meant by non-functional is debatable — not transcribed, not translated, or not under control of a promoter sequence? Pseudogenes are similar to protein-coding genes because they are usually copied from a parent gene, either through unsuccessful duplication or by retrotransposition (whereby a gene is transcribed into RNA, which is then 'reverse-transcribed' back into DNA and inserted somewhere different in the genome). Because all this copying does not yield a normal, functioning protein, pseudogenes are usually identified by obvious 'disablements' in their sequence, such as frameshifts or premature stops. They have been of interest because they provide records of ancient molecules encoded by the genome.

Although pseudogenes have generally been considered as evolutionary 'dead-ends', one of the surprises of genome sequencing has been how abundant they are: tens of thousands of pseudogenes are found in mammalian genomes (roughly the same number as protein-coding genes in all mammals sequenced so far)⁷. In addition, a large proportion of these sequences seem to be under some form of purifying selection⁸ — whereby natural selection eliminates deleterious mutations from the population — and genetic elements under selection have some use. Finally, several large-scale genomic studies probing non-gene parts of the genome for biochemical activity have found many pseudogenes being transcribed and regulatory factors binding upstream of them. One such investigation, the ENCODE pilot project⁹, which looked at a representative 1% of the sequence of the human genome, found strong evidence for at least one-fifth of pseudogenes being actively transcribed.

These observations indicate that pseudogenes might not be purely dead relics of past genes but could be resurrected for new biochemical activities. Indeed, functioning pseudogenes have been reported previously. For instance, in snails, a pseudogene is involved in translational control of the gene that codes for nitric oxide synthase¹⁰. And transcripts of the mouse pseudogene makorin1-p1 have been proposed to inhibit degradation of their parent gene's mRNA, effectively enhancing its expression¹¹, although this observation has been debated. Nevertheless, a clear mechanism for the functioning of pseudogenes has been lacking. The six studies — four in flies^1,2,3,4 and two in mice^5,6 — provide such a direct pathway, showing that pseudogene transcripts can act as natural siRNAs.

Broadly speaking, RNAi involves various types of small 'guide' RNA sequence regulating protein levels by targeting mRNA for degradation. Pseudogenic siRNAs provide two of the four categories posited by the six studies to organize the natural, or 'endo', siRNAs (Box 1).

Endo-siRNAs in the first category mediate transposon silencing, which is typically a feature of Piwi-interacting RNAs (piRNAs). The studies were therefore careful to distinguish between endo-siRNAs associated with transposons and piRNAs on the basis of size (21–22 nucleotides versus 24–30) and Argonaute effector-protein partner (Ago2 versus Piwi). The second category of endo-siRNAs arise from bidirectional transcription of partially overlapping loci on opposite DNA strands^1,12. Studies in mice^5,6 identify a few examples of these, and around 1,000 have been reported in flies¹, with their target genes consisting mainly of those with nucleic-acid functions, such as nuclease activity and transcription-factor binding¹².

The third category of siRNAs, which have been identified only in mice^5,6, are products of the interaction between a spliced mRNA transcript from a protein-coding parent gene and an antisense transcript from its pseudogene, which can be located far away from its parent gene, on the same or a different chromosome (Fig. 1a). Endo-siRNAs of the fourth category are closely related to those in the third. They arise from hairpin-shaped sequences, which in mice^5,6 can come from inverted-repeat structures of pseudogenes (Fig. 1b). Here, the pseudogene also regulates its parent gene, but the double-stranded RNA precursor of the endo-siRNA comes from transcription of an inverted-repeat sequence, producing a hairpin. The reports show that mouse proteins affected by the third and fourth categories of endo-siRNAs are disproportionately involved in particular functions — such as regulating cytoskeletal dynamics — which indicates that their underlying pseudogene-mediated regulation has been explicitly selected for and is not simply caused by random pairing of transcribed genes and pseudogenes.

**Figure 1: Pseudogene-mediated production of endogenous small interfering RNAs (endo-siRNAs).**

Hairpin precursors of endo-siRNAs have also been found in flies, but the evidence links them only weakly with inverted repeats of pseudogenes. Thus, most of the new data for pseudogenic siRNAs come from mouse rather than fly studies. One possible reason for this is that the mouse genome contains many more pseudogenes than the fly genome¹³. In fact, even compared with other metazoan organisms such as worms, flies are particularly poor in pseudogenes, possibly owing to pronounced genomic deletion processes known to occur in this organism¹⁴.

The scarcity of pseudogenes in flies makes their detection particularly difficult. Nevertheless, there is suggestive evidence for fly pseudogenes functioning as endo-siRNAs. First, an appreciable number (∼30) have an inverted-repeat structure, associated with the formation of hairpins. Second, many of the sequences obtained by ultra-high-throughput sequencing of small RNAs in the fly coincide with DNA regions containing pseudogenes. In particular, a small but significant number of the 'reads' found using the Solexa sequencing technology^1,4 can be intersected with some 70 pseudogenes, for an average of roughly 12 reads each. Finally, there is strong evidence that for several genes — particularly the β-esterase gene and its pseudogene — a duplicated pseudogene forms a functional complex with its parent gene, with regulatory consequences¹⁵.

Of course, to demonstrate the activity of pseudogenes conclusively, further experiments are needed. Deleting a pseudogene and demonstrating an effect on its potentially regulated parent gene would be most definitive. Also of great value would be studying the expression patterns of a potential endo-siRNA-producing pseudogene and its regulated parent gene across various tissues — data which should be generated by the ENCODE and modENCODE projects.

In addition to connecting RNAi with pseudogenes, the new studies^1,2,3,4,5,6 also blur the distinctions between the three 'traditional' classes of small RNA — siRNAs, piRNAs and microRNAs (miRNAs) — which are distinct in their biogenesis and cellular roles (Box 1). The studies^1,2,3,4,5,6 find that endo-siRNAs regulate transposons as piRNAs do; that, like miRNAs, they can arise from hairpins; and that, in flies, their processing involves a similar co-factor to the processing of miRNAs (Box 1).

This blurring of boundaries among different types of small RNA, together with the newly established links between siRNAs and pseudogenes, has interesting evolutionary implications. In plants, inverted duplications containing a protein-coding gene have been proposed¹⁶ as a mechanism to create new miRNAs. Thus, one can imagine a gene being copied (either by duplication or retrotranscription) and this copy then being duplicated (again) in inverted fashion. Given the ubiquitous nature of genomic transcription, the copy and its inverted duplicate could potentially be transcribed to a hairpin precursor of endo-siRNAs to regulate the parent gene.

As the function of the hairpin no longer has anything to do with encoding protein, its sequence, still under selection, can acquire frameshifts and stop codons, making it seem pseudogenic. One could even imagine its sequence drifting further and becoming gradually transformed into a miRNA gene, the sequence of which is much less similar to the gene encoding its target mRNA. So pseudogenes encoding endo-siRNAs might provide a crucial intermediate link to understanding the evolution of miRNA-mediated regulation¹⁷. Although speculative, the plausibility of this theory is bolstered by a recent survey¹⁸ of the genomic context of more than 300 human miRNA loci, which identified two that lie within pseudogenes.

Box 1: Small but significant

There are three main classes of small RNA, which generally differ in biogenesis, sorting and function¹⁹.

(1) MicroRNAs (miRNAs) mainly regulate genes involved in developmental processes. Specific miRNA genes encode mRNA-like primary transcripts that form hairpin structures, which are in turn excised by the enzyme Drosha (not shown) to form precursor miRNAs. In flies, further cleavage of these sequences by the Dicer enzyme Dcr1 and its specific co-factor Loqs yields mature miRNAs of ∼22 nucleotides (nt). To carry out their function, miRNAs are incorporated into the RISC protein complex, which contains the effector protein Ago1, a member of the Argonaute family.

(2) Conventional small-interfering RNAs (siRNAs) of ∼21 nucleotides are produced through cleavage of double-stranded RNA (dsRNA) — in flies, by the Dicer enzyme Dcr2 and its co-factor R2D2 (refs 17, 19). These small RNAs bind to the Argonaute-family effector Ago2 and function in defence against external nucleic acids, such as synthetic dsRNAs or intermediates of viral replication.

(3) Discrete genomic loci give rise to single-stranded RNA sequences (ssRNA), which are then processed to ∼27 nt Piwi-interacting RNAs (piRNAs). piRNA biosynthesis remains somewhat ambiguous, but is known not to require Dicer. piRNAs bind to Piwi, another member of the Argonaute family that seems to be expressed only in germline cells. It is believed that these small RNAs function as master controllers of mobile genetic sequences called transposable elements²⁰.

In the figure, grey lines indicate known relationships, whereas red lines indicate new ones reported in the six papers^1,2,3,4,5,6. Clearly, the boundaries between the three small-RNA classes have been somewhat blurred by these reports. For details of how endo-siRNAs arise from pseudogenes, see Figure 1.

R.S. & M.G.

Change history

06 June 2008
seven pseudogenes, for an average of roughly two reads each was changed to 70 pseudogenes, for an average of roughly 12 reads each on 6 June 2008.

References

Czech, B. et al. Nature 453, 798–802 (2008).
Article ADS CAS Google Scholar
Ghildiyal, M. et al. Science 320, 1077–1081 (2008).
Article ADS CAS Google Scholar
Kawamura, Y. et al. Nature 453, 793–797 (2008).
Article ADS CAS Google Scholar
Okamura, K. et al. Nature 453, 803–806 (2008).
Article ADS CAS Google Scholar
Tam, O. H. et al. Nature 453, 534–538 (2008).
Article ADS CAS Google Scholar
Watanabe, T. et al. Nature 453, 539–543 (2008).
Article ADS CAS Google Scholar
Zhang, Z., Carriero, N. & Gerstein, M. Trends Genet. 20, 62–67 (2004).
Article Google Scholar
Zheng, D. et al. Genome Res. 17, 839–851 (2007).
Article CAS Google Scholar
The ENCODE Project Consortium Nature 447, 799–816 (2007).
Korneev, S. A., Park, J.-H. & O'Shea, M. J. Neurosci. 19, 7711–7720 (1999).
Article CAS Google Scholar
Hirotsune, S. et al. Nature 423, 91–96 (2003).
Article ADS CAS Google Scholar
Okamura, K., Balla, S., Martin, R., Liu, N. & Lai, E. C. Nature Struct. Mol. Biol. doi:10.1038/nsmb.1438 (2008).
Harrison, P. M., Milburn, D., Zhang, Z., Bertone, P. & Gerstein, M. Nucleic Acids Res. 31, 1033–1037 (2003).
Article CAS Google Scholar
Petrov, D. A., Lozovskaya, E. R. & Hartl, D. L. Nature 384, 346–349 (1996).
Article ADS CAS Google Scholar
Balakirev, E. S., Anisimova, M. & Ayala, F. J. J. Mol. Evol. 62, 496–510 (2006).
Article ADS CAS Google Scholar
Allen, E. et al. Nature Genet. 36, 1282–1290 (2004).
Article CAS Google Scholar
Chapman, E. J. & Carrington, J. C. Nature Rev. Genet. 8, 884–896 (2007).
Article CAS Google Scholar
Devor, E. J. J. Hered. 97, 186–190 (2006).
Article CAS Google Scholar
Matranga, C. & Zamore, P. D. Curr. Biol. 17, R789–R793 (2007).
Article CAS Google Scholar
Brennecke, J. et al. Cell 128, 1089–1103 (2007).
Article CAS Google Scholar

Download references

Author information

Authors and Affiliations

Rajkumar Sasidharan and Mark Gerstein are in the Departments of Molecular Biophysics and Biochemistry, and Computer Science, Yale University, New Haven, Connecticut 06520, USA. mark.gerstein@yale.edu,
Rajkumar Sasidharan & Mark Gerstein

Authors

Rajkumar Sasidharan
View author publications
You can also search for this author in PubMed Google Scholar
Mark Gerstein
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sasidharan, R., Gerstein, M. Protein fossils live on as RNA. Nature 453, 729–731 (2008). https://doi.org/10.1038/453729a

Download citation

Published: 04 June 2008
Issue Date: 05 June 2008
DOI: https://doi.org/10.1038/453729a

This article is cited by

The Pseudogene BMEA_B0173 Deficiency in Brucella melitensis Contributes to M-epitope Formation and Potentiates Virulence in a Mice Infection Model
- Ge Zhang
- Hao Dong
- Xingjia Shen
Current Microbiology (2022)
Molecular fossils “pseudogenes” as functional signature in biological system
- Rajesh Kumar Singh
- Divya Singh
- Akhileshwar Kumar Srivastava
Genes & Genomics (2020)
Tissue-specific expression of histone H3 variants diversified after species separation
- Kazumitsu Maehara
- Akihito Harada
- Yasuyuki Ohkawa
Epigenetics & Chromatin (2015)
Faster evolving Drosophila paralogs lose expression rate and ubiquity and accumulate more non-synonymous SNPs
- Lev Y Yampolsky
- Michael A Bouzinier
Biology Direct (2014)
Vascular Calcification in Diabetes: Mechanisms and Implications
- Janet K. Snell-Bergeon
- Matthew J. Budoff
- John E. Hokanson
Current Diabetes Reports (2013)

Protein fossils live on as RNA

Box 1: Small but significant

Change history

06 June 2008

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

This article is cited by

The Pseudogene BMEA_B0173 Deficiency in Brucella melitensis Contributes to M-epitope Formation and Potentiates Virulence in a Mice Infection Model

Molecular fossils “pseudogenes” as functional signature in biological system

Tissue-specific expression of histone H3 variants diversified after species separation

Faster evolving Drosophila paralogs lose expression rate and ubiquity and accumulate more non-synonymous SNPs

Vascular Calcification in Diabetes: Mechanisms and Implications

The Drosophila hairpin RNA pathway generates endogenous short interfering RNAs

Drosophila endogenous small RNAs bind to Argonaute 2 in somatic cells

An endogenous small interfering RNA pathway in Drosophila

Search

Quick links

Change history

06 June 2008

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links