Here I review the properties of the mouse retroelement VL30-1, which apparently derived from retrotranspostions of a founder VL30 retrovirus that infected the mouse germline after the mouse–human speciation. The VL30-1 gene is transcribed as a long noncoding RNA (lncRNA) with an essential host function in an epigenetic transcription switch (ETS) that regulates transcription of multiple genes, including proto-oncogenes that control cell proliferation and oncogenesis. The ETS involves the tumor suppressor protein PSF that has a DNA-binding domain (DBD) and two RNA-binding domains (RBDs). The DBD binds to promoters that have a DBD-binding sequence and switches off transcription, and the RBDs bind lncRNAs that have a RBD-binding sequence, releasing PSF and switching on transcription. VL30-1 lncRNA has two RBD-binding sequences, apparently acquired by mutations during retrotranspositions of the founder retrovirus, which drive proto-oncogene transcription and oncogenesis via the ETS. VL30-1 lncRNA is a seminal example of the key role of endogenous retroviruses (ERVs) and their retroelements in the evolution of transcription regulatory systems.
The operon model of gene regulation, a founding concept of molecular biology proposed by Jacob and Monod in 1961 based on their studies with Escherichia coli,1 focused attention on protein-coding genes as the fundamental functional component of all genomes, as affirmed in Monod’s statement that ‘anything found to be true for E. coli must also be true for the elephant.’ Although it was known that mammalian genomes also contained DNA that did not encode any proteins, such DNA was usually called useless or selfish.2,3 The revelation from whole-genome sequencing that protein-coding genes comprise only a minuscule part of a mammalian genome, ~2% of the human and mouse genomes,4,
Here I discuss the remarkable properties of a mouse ERV called VL30-1,10 a member of the VL30 ERV family11 that probably originated from an infection of the mouse germline by a founder retrovirus after mouse–human speciation, as there are no VL30-related sequences in the human genome. The mouse genome currently is estimated to contain 150–200 VL30-related sequences, ranging from a full-length 5–6-kbp VL30 gene that has the features of an ERV, notably 5′ and 3′ long terminal repeats (LTRs), to a single ‘solo’ LTR.11,12 In the full-length VL30 genes sequenced so far, including VL30-1, the internal DNA flanked by the LTRs contains multiple mutations, including stop codons in all three reading frames, which block translation of the retroviral proteins required for further DRT cycles. Although the DRT cycles are suppressed, at least some of the full-length VL30 genes, including VL30-1, are transcribed as a lncRNA with a poly-A tail and are exported to the cytoplasm.
The VL30-1 lncRNA was discovered in an experiment involving transfection of a human tumor cell by a retroviral vector produced in a mouse cell containing VL30-1 lncRNA, resulting in encapsulation of VL30-1 lncRNA in the retroviral particles and integration in the host genome as an ERV, which increased the metastatic potential of the host.10 Further studies showed that the increase in metastatic potential was caused by a novel mechanism of gene regulation involving the protein PSF13 and a PSF-binding RNA.14,
The PSF gene probably existed in the mouse genome before the retrovirus infection that generated the VL30-1 gene. Although PSF protein is expressed during early development when cells proliferate, it does not function as a repressor until cells begin to differentiate and proliferation stops (unpublished data). Consequently, another PSF-binding lncRNA RNA, probably MALAT-1,20,21 was needed before VL30-1 lncRNA was available, to prevent binding of PSF to proto-oncogenes during early development. As VL30-1 lncRNA binds more effectively to PSF than MALAT-1 lncRNA (unpublished data), it could have co-opted the role of MALAT-1 lncRNA as the major PSF-binding RNA in the mouse ETS, providing an explanation for the surprising finding that MALAT-1 lncRNA is dispensable for mouse development and survival.22,23 I propose that the beneficial function of VL30-1 lncRNA, which was needed for its evolutionary survival in the mouse genome, was achieved in this way, providing a seminal example of the importance of retroviruses and their retroelement descendants in shaping the evolution of epigenetic systems for regulating gene transcription.