Systematic analysis and evolution of 5S ribosomal DNA in metazoans

Vierna, J; Wehner, S; Höner zu Siederdissen, C; Martínez-Lage, A; Marz, M

doi:10.1038/hdy.2013.63

Download PDF

Original Article
Published: 10 July 2013

Systematic analysis and evolution of 5S ribosomal DNA in metazoans

J Vierna^1,2^na1,
S Wehner^3,4^na1,
C Höner zu Siederdissen⁵,
A Martínez-Lage¹ &
…
M Marz^3,4

Heredity volume 111, pages 410–421 (2013)Cite this article

2938 Accesses
27 Citations
3 Altmetric
Metrics details

Subjects

Abstract

Several studies on 5S ribosomal DNA (5S rDNA) have been focused on a subset of the following features in mostly one organism: number of copies, pseudogenes, secondary structure, promoter and terminator characteristics, genomic arrangements, types of non-transcribed spacers and evolution. In this work, we systematically analyzed 5S rDNA sequence diversity in available metazoan genomes, and showed organism-specific and evolutionary-conserved features. Putatively functional sequences (12 766) from 97 organisms allowed us to identify general features of this multigene family in animals. Interestingly, we show that each mammal species has a highly conserved (housekeeping) 5S rRNA type and many variable ones. The genomic organization of 5S rDNA is still under debate. Here, we report the occurrence of several paralog 5S rRNA sequences in 58 of the examined species, and a flexible genome organization of 5S rDNA in animals. We found heterogeneous 5S rDNA clusters in several species, supporting the hypothesis of an exchange of 5S rDNA from one locus to another. A rather high degree of variation of upstream, internal and downstream putative regulatory regions appears to characterize metazoan 5S rDNA. We systematically studied the internal promoters and described three different types of termination signals, as well as variable distances between the coding region and the typical termination signal. Finally, we present a statistical method for detection of linkage among noncoding RNA (ncRNA) gene families. This method showed no evolutionary-conserved linkage among 5S rDNAs and any other ncRNA genes within Metazoa, even though we found 5S rDNA to be linked to various ncRNAs in several clades.

Comparisons between small ribosomal RNA and theoretical minimal RNA ring secondary structures confirm phylogenetic and structural accretion histories

Article Open access 06 May 2020

Jacques Demongeot & Hervé Seligmann

Varying strength of selection contributes to the intragenomic diversity of rRNA genes

Article Open access 25 November 2022

Daniel Sultanov & Andreas Hochwagen

Intragenomic rDNA variation - the product of concerted evolution, mutation, or something in between?

Article Open access 04 July 2023

Wencai Wang, Xianzhi Zhang, … Aleš Kovařík

Introduction

The evolution of 5S ribosomal DNA (5S rDNA) has been studied in some groups of organisms, mainly within genera or within families (for example, Martins and Wasko (2004); Rooney and Ward (2005); Vierna et al. (2009; Freire et al. (2010); Perina et al. (2011); Vizoso et al. (2011)). Nevertheless, several intriguing features, such as high conservation along evolution in contrast to high intragenomic divergence, a plastic genomic organization and linkage to other genes, make this multigene family an interesting issue in evolutionary genetics that deserves a large-scale analysis.

5S rDNA (as well as other ribosomal genes) is expected to display low intragenomic divergence levels owing to the occurrence of homogenizing mechanisms (unequal crossing-overs and gene conversions) that are favored by the tandem arrangement of these genes and lead to so-called concerted evolution (reviewed in Eickbush and Eickbush (2007)). However, many reports have been recently published in which the concerted evolution model did not explain the intragenomic divergence found in some organisms, mainly (but not exclusively) within the non-transcribed spacer (NTS) region (Rooney and Ward, 2005; Fujiwara et al., 2009; Vierna et al., 2009; Freire et al., 2010; Úbeda-Manzanaro et al., 2010; Perina et al., 2011; Vizoso et al., 2011). Other evolutionary models (birth-and-death evolution; mixed process of concerted and birth-and-death evolution (Nei and Rooney, 2005)) have been proposed to drive the evolution of 5S rDNA (Rooney and Ward, 2005; Fujiwara et al., 2009; Vierna et al., 2009; Freire et al., 2010; Úbeda-Manzanaro et al., 2010; Perina et al., 2011; Vizoso et al., 2011).

5S rDNA is present in a variable number of repeats (usually, hundreds of copies) in each genome. These repeats can occur in tandem forming long arrays in some species, whereas in other cases they are dispersed throughout the genome. In some organisms, 5S rDNA repeats have been found linked to other noncoding RNA (ncRNA) gene families, such as small nuclear RNAs (snRNAs) (Vahidi et al., 1988; Nilsen et al., 1989; Zeng et al., 1990; Keller et al., 1992; Pelliccia et al., 2001; Cross and Rebordinos, 2005; Manchado et al., 2006; Marz et al., 2008; Freire et al., 2010; Vierna et al., 2011; Vizoso et al., 2011) or to protein-coding genes such as histones (Eirin-Lopez et al., 2004).

Although linkages of 5S rDNA to other ncRNAs have been shown also in bacteria (Gongadze, 2011), protists (Drouin and Tsang, 2012) and plants (Wicke et al., 2011; Layat et al., 2012) for longer time scales, the animal linkages of ncRNAs seem not to be stable over long evolutionary time scales. They appear to be the result of stochastic processes within genomes with no effect on fitness, even though this has not been demonstrated (see Drouin and Moniz de Sá (1995) for a review). Interestingly, 5S rDNA repeats can show different organization modes in the same species (Little and Braaten, 1989), and their transposition could be frequent within genomes, as reported by Drouin and Moniz de Sá (1995); Kalendar et al. (2008); Cohen et al. (2010).

Reports on the evolution of 5S rDNA in various animal and fungi groups have been published during the last few years, and all (Martins and Wasko, 2004; Vierna et al., 2009; Freire et al., 2010; Úbeda-Manzanaro et al., 2010; Perina et al., 2011; Vizoso et al., 2011) except one (Rooney and Ward, 2005) have relied on data obtained from PCR-cloning-sequencing techniques. Even though these procedures are appropriate when working with non-model organisms, they may not give a complete picture of the features and diversity of this multigene family. Fortunately, this can be solved by using genome project data, when available. Here, we have obtained a huge set of animal 5S rDNA candidate sequences, which were carefully filtered according to stringent criteria. Additionally, we gathered a set of U1 small nuclear DNA sequences from the same metazoan genomes, that were used in the linkage analysis between 5S rRNA and other ncRNAs.

Materials and methods

Sequence data

Previously known 5S rRNA and U1 snRNA sequences were taken from Rfam (Gardner et al., 2011) and selected previous studies (Marz et al., 2008; Vierna et al., 2009). These sequences (available from the electronic supplement http://www.rna.uni-jena.de/supplements/5SRNA/index.html) were used as an initial query in the development of a candidate pool (see below). The source, composition, download dates, assembly status, coverage, real number of nucleotides and expected number of nucleotides (from the animal genome size database (Gregory, 2012)) of all genomes analyzed are listed in the electronic supplement as well.

Homology search for 5S rRNAs and U1 snRNAs

Development of a candidate pool

First, we used blast (Altschul et al., 1990) with a low E-value <10⁻⁴ to get as many 5S rRNA and U1 snRNA candidates as possible. Overlapping hits were merged and extended 50 nt in both directions, manually viewed using emacs ralee mode (RNA ALignment Editor in Emacs) (Griffiths-Jones, 2005) and cut into their expected length. Consensus sequences of each alignment block and species were added to the query data set. We repeated this blast search with the same parameters and the collection step for all organisms until no new reliable candidates were found.

Sequence conservation

After having studied in detail previously reported 5S rRNA and U1 snRNA sequences, we selected four conserved motifs in animals for each ncRNA, Figure 1. Subsequently, we wrote rnabob descriptors, which characterized the conserved motifs (boxes and Sm-binding site) and their allowed distances (Figure 1, right). We decided against a covariance model, as we did not want a high variability and speed up the analysis. To detect divergent 5S rRNAs, we allowed point mutations to occur in one of the boxes, and variable distances between motifs. Additionally, box Z for the 5S rRNA and Sm-binding site for U1 snRNA were used in six species because of its huge initial candidate set: Homo sapiens, Pongo pygmaeus, Macaca mulatta, Bos taurus, Pteropus vampyrus and Saccoglossus kowalevskii. Candidates that did not fulfill these criteria were not discarded but marked with a demerit for further analysis.

Structure conservation

In a second step, we examined the secondary structure of the candidates. If RNAfold (Hofacker, 2003) did not fold the sequences instantly into the expected structure as depicted in Figure 1 (observed for all candidates manually), we used constraint folding RNAfold -C. The constraints used are all individually displayed at the Supplemental Page. Alternatively, we created alignments of the previously reported sequences, given in Figure 1 using clustalw (Larkin et al., 2007) and RNAalifold (Hofacker, 2007). Other candidates were marked with additional penalty for further analysis.

Manual inspection

For each organism, alignments were manually examined for irregularities, such as insertions/deletions, indicating a non-functionality.

Final genes and grouping

Finally, each candidate received a classification: satisfying all our filtering criteria indicated that functionality of 5S rRNA sequences was highly likely (Table 1, A-type). If sequence or structure contained slight variations (single point mutations that affected the secondary structure only slightly), then the candidate was declared as B-type.

Table 1 Number of identified 5S rRNAs

Full size table

If both sequence and structure showed several variations, compared with the rest of sequences of that species (for example, indels of at least 5 nt), then the candidate was defined as questionable (Table 1, Q-type), referring to possible pseudogenes.

Even more divergent candidates were deleted from our data sets and not further considered. In fact, we considered these genes to be not functional, to be pseudogenes.

We used the scoring step as a measure of the trustability of each candidate. All sequences, regardless of their score (A, B or Q), were considered in subsequent analyses.

For each organism, fasta, gff and stockholm alignment files are provided on the Supplemental Page.

Orthologous and paralogous 5S rRNA genes

For each taxon, we manually divided the stockholm alignment files into subgroups, defined here as ‘blocks’ (Table 1).

For the identification of orthologous and paralogous 5S rRNA genes, we used consensus sequences of the blocks and analyzed them with the NeighborNet algorithm (Bryant and Moulton, 2004) and uncorrected p-distances in SplitsTree4 (Huson, 1998).

NTS analysis of clusters

Most 5S rRNA occur within 3000 nt. However, to detect a possible correlation of more distantly located 5S rRNAs, we defined 5S rRNA genes being part of one cluster, if and only if they were located on the same chromosome (scaffold or contig) within 10 000 nt independently of their orientation. We wrote postscript files for each taxon to display the genome-wide arrangement. PDF files are provided in the supplement.

The NTS regions <500 nt between two 5S rRNA candidates were aligned with clustalw. Fasta, gff and alignment files are available at the Supplemental Page.

Regulator analysis of 5S rRNA genes

Upstream promoter analysis

We selected the region comprising positions −35 to −25, upstream the 5S rRNA gene (Hallenberg and Frederiksen, 2001; Vizoso et al., 2011) and citations therein, and searched for conserved motifs with MEME (Bailey et al., 2009). We used parameters −minw 5 −maxw 8 to target a TATA box already described in the literature (see section below). The shuffling of sequences was performed with shuffle −0. Detailed results can be viewed on the Supplemental Page.

Internal promoter analysis

For each stockholm alignment, we created consensus sequences: (A) the most frequent nucleotide was represented in the consensus sequence, and (B) each nucleotide with a frequency >10% was part of the consensus sequence, following the IUPAC coding system.

Terminator analysis

For the terminator analysis, we analyzed 50 nt downstream of each 5S rRNA candidate and checked with rnabob descriptor the first occurrence of the pattern TTTT. Additionally, conserved motifs were identified by MEME (Bailey et al., 2009) (parameters: −minw 6 −maxw 20) within the 30 nt downstream. We used only unique sequences per species. Species with >80 different copies were neglected in this analysis, because of complexity reasons.

Linkage between 5S rRNA and other ncRNAs

We downloaded all known ncRNA classes from RFAM (Gardner et al., 2011), and in case of U1 snRNA and 5S rRNA, we included previous literature as mentioned above and searched them in the metazoan genomes with blast (Altschul et al., 1990). Additionally, we scanned the genomes for tRNAs with tRNAscan-SE (Lowe and Eddy, 1997).

As there is, to our knowledge, no established statistical model describing linkage between ncRNAs in a variety of species, we used a simple Gaussian mixture with a variable number of components.

Blast hits are not weighted, that is, hits with an E-value below a threshold of 10⁻⁴ are included, hits above the threshold are excluded. If genomic duplications due to possible assembly artefacts occur, the naive weight given to such a region could point towards linkage, where no real duplication was present. Therefore, we filtered the data: if exactly the identical number of nucleotides were observed between two linked genes, we assumed assembly artefacts (for example, multiple sequenced contigs) and used only one copy.

For each ncRNA gene copy, we test for linkage with 5S, we build a Gaussian mixture ():

Each Gaussian in the mixture describes the distance μ between a 5S rRNA gene copy and the other gene copy, while σ is the s.d. in this distance. As it is possible that either one 5S gene copy is linked with multiple copies of the other gene, or that multiple pairs of linked 5S rRNAs/other genes exist, we require a k-component mixture.

The number of components k is determined by increasing k from 1 up until no significant improvement in fit is possible. To prevent overfitting, a maximum of 10 Gaussians is allowed, less if the number of data points is lower than 40.

The parameter vector (μ₁,…,μ_k), (σ₁,…,σ_k) is fitted using expectation maximization (Hastie et al., 2001).

Results and Discussion

For the first time, we present here a complete overview of 5S rDNA in metazoans, including secondary structure prediction, genomic organization, sequence characteristics, putative regulatory motifs and linkage to other ncRNAs. Furthermore, we also found striking features in available mammalian genomes described below. Although this analysis shows many facts that depend on current genome assemblies, the reader should keep in mind that the assemblies of different organisms are extremely variable in terms of completeness and therefore are, at least for the number of copies, hardly comparable. Currently available metazoan genome assemblies very often lack multi-copy regions such as centromeres, telomeres and rRNA operons (Copeland et al., 2009; Dalloul et al., 2010; Alkan et al., 2011). Additionally, two identical gene copies located multiple times in the genome are often merged, or even completely removed (Marz et al., 2008; Alkan et al., 2011). According to Alkan et al. (2011), assemblies are in general 16.2% shorter than the reference genome, and 99.1% of validated duplicated sequences are missing from the assembled genome. However, in some assemblies we can find repeated sequences of the same locus, because at the contig or scaffold levels, some genomic regions are covered multiple times. In our analysis, we take these facts into account, and show—as a side effect—how much information we can obtain from genomic sequences when working with multiple-copy genes, regardless of genome assemblies. Available cytogenetic mapping data support our analysis as described in detail below.

Arrangement of 5S rRNA copies: number and evolutionary relationship

The overall summary of 5S rRNA copies in animals is depicted in Table 1. We discriminated between three different classes: (A) putative functional genes that passed all our filters, (B) those that showed slight variations in sequence or structure, and (Q) those that remained questionable and might even be possible pseudogenes.

Overall, we identified 12 766 5S rRNA sequences in 97 organisms, ranging from three sequences in the ricefish Oryzias latipes to 3180 sequences in the zebrafish Danio rerio. Both assemblies are in chromosomal stage. In both cases, real genomes are 1.2-fold larger than the assemblies (Gregory, 2012), see Table 1. The genome coverages of O. latipes and D. rerio is 10.6 × and ∼30 × , respectively. Owing to the assembly problems mentioned above, we assume the lower boundary for 5S rRNA copies in these fishes to be about 3180. In general, when the coverage of the genome is at least 8 × and the genome is sorted into chromosomes, it can be considered that the listed number of copies (Table 1) is a lower boundary. Cytogenetic mapping of the Squalius alburnoides being closely related to D. rerio showed several clusters on three chromosomes (Gromicho et al., 2006), in agreement with the 43 clusters on three chromosomes of the zebrafish in our study. Comparison of fish genomes bring in general difficulties due to polyploidy. The cytogenetic mapping of Gallus gallus showed one cluster on chromosome 9 (Cabral-de-Mello et al., 2011), which completely agrees with the one cluster we found also on chromosome 9.

The genome sequence of the most basal deuterostome acorn worm S. kowalevskii shows 1166 different copies. Protostomes seem to have, in general, a lower number of 5S rRNA copies. Although the genome of the polychaete worm Capitella capitata displays 1584 copies, we assume the real minimal number of 5S rRNA copies to be much smaller, because the genome is on contig stage, which is 10 times larger than the expected genome size (Gregory, 2012), see Supplemental Page. We found 410 different copies of 5S rRNA and we set this value as the minimal copy number in C. capitata. By cytogenetic mapping, Dichotomius have been shown to consist of a very strong characteristic cluster on chromosome 2 (Cabral-de Mello et al., 2010). The only coleoptera investigated in this manuscript (Tribolium) is not assembled on chromosomal level; however, it shows also one huge cluster of 151 5S rRNA copies. Previous reports have shown that copy number is very variable among metazoans: 1700–2000 copies (including pseudogenes) in humans (Sorensen and Frederiksen, 1991), 50–100 copies in Macaca fascicularis (Jensen and Frederiksen, 2000), 35–41 copies in the chicken (Daniels and Delany, 2003), 24 000–61 000 copies (including pseudogenes) in three amphibians (24 000 in Xenopus laevis (Hilder et al., 1983)) and only three copies in Plasmodium falciparum (Shippen-Lentz and Vezza, 1988). However, these estimates relied on the method used and on the ability to differentiate among functional and non-functional copies. Our results do not perfectly agree with these examples as we predicted, in general, a lower number of copies (18 in humans, 12 in M. mulatta, 6 in the chicken and 60 Xenopus tropicalis).

According to sequence and secondary structure features, we identified different 5S rRNA classes in some genomes as described below. Within species, alignments clearly unveiled disjunct sets, hereafter called ‘blocks’, in 58 species, see Table 1. We aligned the consensus sequences of the 253 blocks retrieved. In the network obtained, we can distinguish four main 5S rRNA groups, see Figure 2—left.

Orthologous 5S rRNA genes

Vertebrate 5S rRNA sequences are clearly evolutionary separated from other metazoan sequences. Interestingly, basal deuterostomes (Hemichordata, Tunicata and Cephalochordata) and nematodes share high sequence similarity, whereas the sequences of other metazoans (Arthropoda, Lophotrochozoa, Cnidaria, Porifera and Placozoa) clustered into a distinct 5S rRNA group.

Paralogous 5S rRNA genes

When comparing consensus sequences of mammalian 5S rRNA blocks (Figure 2—right), we found, in contrast to non-mammalian sequences, a core 5S rRNA set that comprised at least one sequence of each mammalian species. Sequences within this core set were very similar (nearly no mutations), whereas consensus sequences of the other blocks were relatively divergent (some of them might even be non-functional, such as possibly Loxodonta africana 2, see Figure 2—left). No grouping or pattern can be observed in the divergent 5S rRNA set. 5S rRNA seems to have undergone two main evolutionary processes: on the one hand, the data suggest that the long-term evolution of the 5S rRNA genes in mammals is characterized by high selection pressure on housekeeping 5S rRNAs (for example, the 5S rRNA core set) and on the other hand gene diversification, which may provide adaptative potential to environmental change. In other words, we may be facing an evolutionary scenario in which strong purifying selection (and perhaps mechanisms involved in concerted evolution) maintains the integrity of housekeeping 5S rRNAs, whereas birth-and-death processes generate variation through duplications.

The distribution of some orthologous 5S sequences (Figure 2) might be explained by horizontal gene transfer of transposable elements similar to SPIN genes (Syvanen, 2012). However, this is, especially for housekeeping genes, under discussion.

5S rDNA clusters and NTS analysis

In order to study 5S rDNA sequences within species, we analyzed copies separated by less than 10 000 nt (that is, within a ‘cluster’) in more detail. The number of clusters with at least two 5S rRNAs can be viewed in Table 1. The size of clusters depends on the genome and its assembly, and can be hardly compared.

In many species, we found clusters with differences in the length of their NTS. For example, in the honey bee Apis mellifera we found seven copies on contig GroupUn.750 with a constant spacer of 249 nt, whereas contig GroupUn.96 had five copies separated by a 711 nt spacer. Similarly, other species showed NTSs of different sizes in the same contig. This agrees with other species, as previously reported (for example, in molluscs (Vierna et al., 2011), arthropods (Perina et al., 2011) and chordates (Gornung et al., 2007)). In this work, we add to this list more chordate, annelid, arthropod, cephalochordate, placozoan, cnidarian and molluscan species. Sequence orientation and distances between 5S rRNA regions can be obtained for each organism on the Supplemental Page. In the following organisms, we have found 5S rRNA copies that displayed different orientations in the chromosome, a fact that is not in agreement with our expectations according to concerted evolution of repeats within the same cluster. In the cases in which distances among repeats were large (for example, in X. tropicalis, Drosophila melanogaster, D. virilis, D. mojavensis and D. willistoni), it is not unexpected that gene conversion was unable to homogenize the copies within the cluster. However, in other cases, distances between repeats were small (Petromyzon marinus, Pediculus humanus and Trichoplax adhaerens). This would indicate that the inversions are recent or that the unit of homogenization by gene conversion involves both repeats.

To determine the evolution of 5S rDNA in more detail, we cut and aligned the NTS regions <500 nt (alignment available on the Supplemental Page). As hypothezised by Vierna et al. (2011), a 5S rDNA sequence that is evolving concertedly within a given cluster can be transposed into another 5S rDNA cluster composed of repeats that are different to that one, but similar among them. After the occurrence of duplications involving both variants, it is possible to obtain an intermixed organization of 5S rDNA, in which NTSs located in the cluster are completely divergent. This is what we report here for some species (Daphnia pulex and D. rerio, see NTS alignment at Supplemental Page). These findings agree with the widespread idea that 5S rDNA repeats are transposed from one genome location to another (Rooney and Ward, 2005; Datson and Murray, 2006). Intermixed organization of NTS sequences was also found by Gornung et al. (2007); Perina et al. (2011); Vierna et al. (2011) in molluscs, crustacean and fishes species.

Totally unexpectedly, all NTS sequences (divided into four NTS types) retrieved from the mollusc Lottia gigantea and from the porifer Reniera sp. were almost identical. We blasted these NTSs against various nucleotide databases, but failed to find any similarities with previously reported sequences, such as bacterial/viral insertions. The same picture of very closely related NTS regions is given from the insects Anopheles gambiae and P. humanus, which is directly in contrast to closely related organisms sharing no related NTS features, such as Ciona intestinalis and C. savignyi or most of the drosophilids.

We have also retrieved putatively functional and non-functional 5S rRNA sequences within one cluster in many organisms. This has been reported for D. melanogaster before (Sharp et al., 1984).

In order to analyze the evolution of the NTS region at the species level, we selected the genus in which the most species were available (Drosophila, 12 species). We obtained the following results: (1) NTSs can be divided roughly into 10 different types, according to alignment clustering. In fact, NTS sequences that belong to the same type can be aligned because their degree of divergence is not high; (2) all species display only one type of NTS sequence in their genomes, except D. mojavensis and D. grimshawi, with two divergent NTS sequence types; (3) the drosophilids do not share their NTS type with their congeners; however, the very recent split species show similar NTS sequences (D. persimilis/D. pseudoobscura and D. simulans/D. sechellia); and (4) the different NTS types defined agree with the phylogeny of these 12 species (Drosophila 12 Genomes Consortium et al., 2007). Considering these results and the high degree of conservation of the 5S rRNA copies, we hypothesize an evolutionary scenario in which the long-term evolution of 5S rDNA in the genus Drosophila is driven by strong selection over the 5S rRNA copies, gene duplications and transpositions that generate new NTS loci, and homogenizing mechanisms within each array. The divergent NTSs retrieved from D. mojavensis and D. grimshawi could also point toward the occurrence of ancestral polymorphism. Birth-and-death evolution with a fast gene turnover, concerted evolution and mixed models combined with strong selection can explain these results.