Introduction

With the advent of massive sequencing projects, many complete genome sequences have become available for examination and analysis. However, one important part of these eukaryotic genomes still remains unknown, that is, the repetitive DNA sequences that eukaryote genomes harbor in wide arrays (Britten and Kohne, 1968; Ganley and Kobayashi, 2007), including among other repetitive elements, centromeric and paracentromeric repeats, telomeric repeats, ribosomal DNA (rDNA) clusters and a profusion of satellite DNA (satDNA) repeats. The latter are arranged in tandem, ranging from a few units to millions of copies, and are usually located close to centromeric and/or telomeric regions, forming part of constitutive heterochromatin (Charlesworth et al., 1994). SatDNA repeats are usually composed of short sequences of around 150–180 bp (or 300–360 bp) corresponding to approximately one (or two) nucleosome units (Plohl et al., 2008). SatDNA is frequently rich in adenine and thymine, due mainly to the presence of A–T triplets spaced along the repeat unit. These AT repeats appear to be involved in the typical bent structure of satDNA, although the possible functional nature of this DNA curvature has not been firmly established (see Palomeque and Lorite, 2008, and references therein). The confirmation that satDNA can be transcribed to RNA (Ugarkovic, 2005) raised the possibility that it may be involved in basic cell processes. Transcripts of satDNA can act as ribozymes with self-cleavage activity, as seen in the newt Notophthalmus viridescens (Epstein and Gall, 1987). However, the ultimate functionality of such transcripts remains elusive.

The eukaryotic 45S ribosomal DNA is arranged in tandem on one or several sites per haploid genome, involving one or more pairs of chromosomes. The number of ribosomal RNA genes shows wide variations in eukaryote species (from <30 copies to nearly 30 000) and appears to be positively associated with the metabolic requirements of the organisms and their genome size (Prokopowich et al., 2003). Each repeated unit is formed of three ribosomal RNA (rRNA) genes (coding for 18S rRNA, 5.8S rRNA and 28S rRNA) separated by two transcribed spacers known as internal transcribed spacer 1 and 2 (ITS1 and ITS2). Moreover, each transcribed unit is separated by longer non-transcribed intergenic spacers. Although rDNA is among the most evolutionarily conserved DNA sequences, ITS and intergenic spacer are less subject to functional constraints and evolve at higher rates (Eickbush and Eickbush, 2007). For this reason, ITS sequences have been widely used for phylogenetic inference at low taxonomic levels as they show high-nucleotide substitution rates and low intragenomic sequence heterogeneity (Hillis and Dixon, 1991).

Repetitive sequences appear to evolve as coherent families, where repeats within a family are more similar to each other than they are to other orthologous representatives in related species (Ganley and Kobayashi, 2007). This kind of sequence evolution has been termed concerted evolution and is characterized by a continuous homogenization of repeats via processes such as unequal crossover and gene conversion (Smith, 1974; Nagylaki, 1984). Unequal crossovers appear to drive the evolution of rRNA genes, with sister chromatid exchanges being more frequent than exchange between homologous chromosomes (Eickbush and Eickbush, 2007). This translates as faster homogenization of rDNA units at intra- than interchromosome levels. In contrast to unequal crossover, gene conversion does not change the copy number and shows similar incidence at both intra- and interchromosome levels (Eickbush and Eickbush, 2007). Interchromosome homogenization can take place at two levels, that is, between homologous chromosomes, possibly during meiotic recombination, and between non-homologous chromosomes, possibly via processes such as ectopic recombination or transposition, as suggested by Drouin and de Sá (1995) for concerted evolution of the 5S rDNA units. There are thus three different levels of unit homogenization that, for the sake of simplicity, we shall call sister, homologous and non-homologous homogenization, the former two occurring at the intrachromosomal level and the latter at the interchromosomal, with respect to the haploid genome.

Natural selection can also influence rDNA homogenization as the functionality of these repeats is subject to strong purifying selection, as proposed by the birth-and-death model (Nei and Rooney, 2005). According to this evolutionary model, new genes are continuously generated by duplication events, with some duplications maintained for long periods, whereas others become mutationally non-functional and subject to deletion. This birth-and-death process has been shown to be responsible for the homogenization of several families of DNA sequences, including histone genes (Piontkivska et al., 2002) and the 5S rDNA (Rooney and Ward, 2005). Furthermore, birth-and-death evolution and concerted evolution could operate simultaneously, as has been proposed for 5S rDNA (Freire et al., 2010).

Ascertaining the level of variation between repeats is crucial to determining which evolutionary process best explains the homogenization observed for these repeated sequences (Ganley and Kobayashi, 2007). Moreover, a comparison of different repetitive families can provide useful information with regard to whether concerted evolution acts as a sequence-specific phenomenon or as a genome-wide process. Here, we present the analysis of intragenomic diversity for two repetitive DNA sequences (a satDNA and 45S rDNA) in the grasshopper Eyprepocnemis plorans.

E. plorans is a species of African origin taxonomically differentiated into three subspecies: E. plorans plorans, inhabiting the Mediterranean coasts, the Caucasus, Turkey, Turkmenistan, Iran and the southwest of the Arabian Peninsula; and two other subspecies living in central and southern Africa. Like other orthopteran species, E. plorans presents a huge genome (C-value=10.16 Gbp; Ruiz-Ruano et al., 2011) rich in repetitive sequences. The karyotype of E. plorans is composed of 11 acrocentric pairs of autosomes plus the sex chromosomes (♀ 2n=22+XX; ♂ 2n=22+X0). These chromosomes are distributed into three groups according to size: three long (L1, L2 and X), six medium (M3, M4, M5, M6, M7 and M8) and three short (S9, S10 and S11) chromosomes. The genome of E. plorans also harbors a well-characterized B-chromosome system (reviewed in Camacho et al., 2003) showing more than 50 B-chromosome variants described on the basis of size, morphology and chromosome banding patterns (Bakkali et al., 1999). All these types of B chromosomes are rich in two kinds of repetitive DNA shared by A chromosomes: a 180-bp satDNA (hereafter 180-satDNA) and 45S ribosomal DNA (López-León et al., 1994). This 180-satDNA shows a 180-bp motif repeated in tandem, with a sequence enriched in A+T nucleotides and no open-reading frames (López-León et al., 1995). Fluorescent in situ hybridization has revealed that 180-satDNA and rDNA loci show extensive variations between populations in terms of amounts as well as chromosome location (Cabrero et al., 2003b). The 180-satDNA is usually located close to centromeric regions in the longest chromosomes (L1, L2, X and M3) and in the smallest ones (S9, S11, but not in S10). Large clusters of rDNA are present in the four A chromosomes bearing active NORs (X, S9, S10 and S11), but rDNA is also apparent as small paracentromeric clusters in most remaining A chromosomes, although the largest rDNA cluster is in the B chromosomes (Cabrero et al., 2003b). In fact, B chromosomes in this species are composed mostly of rDNA and 180-satDNA (Camacho et al., 2003).

To gain fuller insight into the evolutionary processes shaping variations in these repetitive DNA sequences, we have developed an approach involving the amplification, cloning and characterization of these sequences obtained from individual microdissected chromosomes of E. plorans. Specifically, we have analyzed the variation between different copies of part of the 45S rDNA (ITS1-5.8S rDNA-ITS2) present in six chromosomes (X, B, M8, S9, S10 and S11), and the 180-satDNA from the X, B and S11 chromosomes. This has provided information on the homogenization patterns of these two kinds of paralogous sequences, and has unveiled notable differences in their intragenomic structure.

Materials and methods

In November 2006, we collected 10 males of the grasshopper E. plorans plorans from Torrox (Malága province, Spain). For comparative purposes, we also collected males and females of Locusta migratoria from Las Gabias (Granada province, Spain), and received samples of the subspecies E. plorans meridionalis from Springbok (South Africa), kindly provided by A Bugrov (SB Russian Academy of Sciences, Novosibirsk, Russia). DNA was extracted from each specimen using the GenElute Mammalian Genomic DNA Miniprep extraction kit (Sigma-Aldrich, St Louis, MO, USA) following the manufacturer’s instructions, and stored at −20 °C. Also, we dissected out and fixed the testes of each individual in freshly prepared 3:1 ethanol–acetic acid. The testes were conserved at 4 °C. For chromosome microdissection, the testes of two males were fixed in 1:3 acetic–ethanol for 10 min, and stored in 70% ethanol at −20 °C before use. We determined the presence, type and number of B chromosomes carried by each specimen cytogenetically, following Camacho et al. (1991).

We microdissected the X, B, M8, S9, S10 and S11 chromosomes from diplotene cells, as described by previously (Teruel et al., 2009), using an Eppendorf TransferMan NK2 micromanipulator coupled to a Zeiss Axiovert 200 microscope (Zeiss, Jena, Germany), with glass needles made with a two-step horizontal pipette puller (Bachofer, Weil der Stadt, Germany) and sterilized by ultraviolet radiation. These chromosomes were microdissected from different individuals, without pooling chromosomes of different specimens. Each chromosome was separately placed in 9 μl DNase-free ultrapure water and its DNA was amplified with a GenomePlex Single Cell Whole Genome Amplification Kit (Sigma) following the manufacturer’s instructions. These amplification reactions usually generate a representative amplification ( a millionfold) of the initial DNA (Sigma GenomePlex Single Cell Whole Genome Amplification Kit technical bulletin). The amplification products were purified with the PCR Cleanup Kit (Sigma) before being used in subsequent analyses.

We amplified the 180-satDNA and part of the 45S rDNA (including ITS1, 5.8S rDNA and ITS2) from the DNA obtained from each individual chromosome (chr-DNA) and also from total genomic DNA (gDNA) obtained from B-lacking and B-carrying individuals (two individuals of each type) by polymerase chain reaction (PCR). The 180-satDNA was amplified using two divergent primers (180repdiv1R: 5′-GCACTGCTTTCCAGATATCACACTAAAATG; 180repdiv1L: 5′-CGCATTTCTGCCGCCTGTGGCGCTACATT) anchored to the 180-satDNA sequence (GenBank accession no. X75637). PCRs were performed in 1 × PCR buffer (MBL (mannose-binding lectin protein)), 2 mM Cl2Mg, 200 μM dNTPs, 0.5 μM of each primer, 1 U of Taq DNA polymerase (MBL) and 100 ng DNA. The PCR program included an initial denaturation step at 95 °C for 5 min, followed by 30 cycles at 94 °C (30S), 62 °C (30S) and 72 °C (30S), plus a final extension step at 72 °C for 7 min. These amplifications yielded the typical ladder pattern of a tandemly repeated satDNA amplification, with bands of 147, 327 and 507 bp, and so on. We isolated the 327-bp band corresponding to the 180-satDNA dimeric unit (in fact, a complete 180-bp unit and an additional part of a second sat DNA unit) from gels for subsequent analyses.

The 45S rDNA was amplified using the 18S and ITS4 universal primers (White et al., 1990), which anchor to the 18S and 28S genes, respectively, amplifying a DNA fragment of 950 bp including ITS1, 5.8S rDNA, ITS2 and part of the 18S and 28S rDNA flanking regions. PCRs were performed in 1 × PCR buffer (MBL), 2 mM Cl2Mg, 200 μM dNTPs, 10 μM of every primer, 1 U of Taq DNA polymerase (MBL) and 100 ng DNA. PCR reactions included an initial denaturalization at 95 °C for 5 min, and then 30 cycles at 94 °C (30S), 55 °C (30S) and 72 °C (30S) plus a final extension at 72 °C for 7 min. All PCR products were visualized in a 1.5% agarose gel stained with SYBR safe (Invitrogen, Carlsbad, CA, USA). For comparison, we amplified the 45S rDNA from gDNA of L. migratoria and the 45S rDNA and 180-satDNA from gDNA of E. plorans meridionalis, using the same protocols.

The amplified DNA products (satDNA and 45S rDNA) from both gDNA and chr-DNA were linked to a TOPO TA cloning vector and cloned in One Shot® TOP10 Competent Cells (Invitrogen). After bacterial growth, we isolated the plasmid DNA using the Perfectprep Plasmid Mini kit (Eppendorf, Hamburg, Germany). Between 5 and 10 clones per PCR reaction were sequenced in both directions by Macrogen Inc. (Seoul, Korea). Dimeric 180-satDNA sequences were edited to obtain the monomeric 180-bp units. The different regions of the 45S rDNA sequence were annotated from the 45S rDNA sequence of Chorthippus parallelus (accession number: AY585651). DNA sequences have been submitted to GenBank under accession nos. JN811903–JN811950 (180-satDNA from E. plorans), JN811827–JN811902 (ITS sequences from E. plorans), JN811951 (ITS sequence from L. migratoria), JX445147 (ITS sequences from E. plorans meridionalis) and JX445146 (180-satDNA sequence from E. plorans meridionalis). See Supplementary Information for details.

Sequence analysis

For DNA-sequence characterization and polymorphism analysis, we grouped microdissected sequences according to the chromosome of origin (i.e. the chromosome from which they were amplified). gDNAs were grouped into two classes (+B, 0B), depending on whether the individual carried (+B) or (0B) B chromosomes. Sequence alignments were made with ClustalW (Thompson et al., 1994). Basic sequence parameters, such as nucleotide proportions and lengths of sequences, were calculated for the set of different haplotypes with MEGA 4 software (Tamura et al., 2007). Nucleotide-diversity analyses were performed with the DnaSP software (Librado and Rozas, 2009). Analyses of the molecular variance (AMOVA) of sequences from specific chromosomes were performed with Arlequin version 3.11 (Excoffier et al., 2007). We calculated FST as a measure of sequence differentiation between chromosomes in relation to the total genomic molecular variance. Low FST values would indicate high sequence homogenization between chromosomes, whereas high FST values would suggest low sequence interchange between chromosomes and/or the existence of homogenization processes that are more efficient within than between chromosomes. We used a one-way analysis of variance (ANOVA) to compare sequence parameters. Statistical tests were performed with STATISTICA software (Statsoft Inc., Tulsa, OK, USA). As polymerase errors could contribute to an apparent increase in the variation found in these sequences, we also performed the analysis after replacing singletons by their consensus positions. A summary of the analyses with the modified data set (i.e. with singletons discarded) is presented in Supplementary Information and Supplementary Tables S2–S4.

As the secondary structure of ITS regions appears to be subject to functional constraints, especially ITS2 in eukaryotes (Coleman, 2007), we analyzed it for the DNA sequences obtained from each chromosome. We separately analyzed the predicted ITS1 and ITS2 secondary structures, using the UNAfold v.3.8 software (Markham and Zuker, 2008), with default options, except for temperature (30 °C), and we chose the sequence with the most stable secondary structure (i.e. those with a lower Gibbs free-energy value). These were used for each ITS as a reference in the RNAsalsa v.0.8.1 software (Stocsits et al., 2009) to predict the secondary structure for the remaining sequences. 180-satDNA secondary structures were also analyzed using UNAfold v.3.8 software (Markham and Zuker, 2008).

To detect divergence processes between two samples of satDNA, we used the satDNA Analyzer software (Navajas-Pérez et al., 2007). This program compares polymorphic sites between species or groups and classifies them in terms of six evolutionary stages according to the model of Strachan et al. (1985). Thus, a Class I site represents complete homogeneity across all repeat units sampled from a pair of species. The frequency of the new nucleotide variant on the site considered is low in class II (25–50%) and intermediate in class III (50–75%), whereas class IV defines a site in which a mutation has replaced the progenitor nucleotide in most members of the repetitive family in the other species (75–100%). Class V represents a diagnostic site in which a new variant is fully homogenized and fixed in all the members of one of the species while the other species retains the progenitor nucleotide. The class VI site represents an additional step above the stage of class V on the path of differentiation, implying new non-shared polymorphism in these species

To represent the haplotype network, we built minimum spanning trees for the set of unique haplotypes using Arlequin (Excoffier et al., 2007), and we visualized them using HapStar v.0.5 (Teacher, Griffiths, 2011). To anchor the networks, we used the 180-satDNA and rDNA sequences of E. plorans meridionalis, the subspecies inhabiting southern and eastern Africa.

Results

180-satDNA

PCR reactions with divergent primers anchored to the 180-satDNA sequence produced the typical ladder pattern of amplified satellite DNA for gDNA (0B gDNA and +B gDNA) and for the DNA obtained through microdissection from S11, X and B chromosomes. We gel-sliced, cloned and sequenced a total of 48 dimeric units (of 327 bp). These sequences were edited to obtain the complete 180-bp monomeric units (Supplementary Information and Supplementary Table S1). In total, we found 47 different haplotypes, most of which showed the 180-bp canonical size, although the presence of a 30-bp deletion in seven clones, together with other small deletions, decreased the average length to 175.1 bp (range 149–181 bp; Table 1 and Supplementary Information). We did not find significant differences in length for sequences from different genomic origin (0B gDNA, +B gDNA, S11, X and B; one-way ANOVA: F=0.91; d.f.=4; P=0.469). These sequences presented a low proportion of G+C (41.1%), due to the presence of short A/T repeats. In fact, the mean content of A+T (58.9%) was similar in the different groups of sequences (one-way ANOVA: F=1.50; d.f.=4; P=0.219). These sequences also showed short, direct or inverse repeats of various nucleotides.

Table 1 Number of clones sequenced, with length and percentage of GC content for the 180-bp satellite DNA of Eyprepocnemis plorans

In the 47 180-satDNA haplotypes, we found a total of 68 conserved sites and 113 variable sites (including indel positions), of which 45 were parsimoniously informative sites (shared polymorphic sites) and 68 singleton sites. These polymorphic sites were apparently not distributed randomly through the entire sequence, as some regions (sites 40–60, 100–120 and 150–180) presented higher nucleotide diversity (see Figure 1a). Nucleotide diversity per site (π), calculated for each genomic origin, ranged from 0.045 for +B gDNA to 0.071 for 0B gDNA (Table 2). The S11 chromosome showed nucleotide diversity similar to that of the B24 chromosome (t=0.289, d.f.=11, NS) and slightly lower than that of the X chromosome (t=3.9, d.f.=14, P=0.002; see Table 2). These polymorphic substitutions were ascribed mainly to classes I and II of the Strachan et al. (1985) model (Table 3), implying that the majority of the polymorphic sites were not exclusively present in one genomic origin and, therefore, that there is a reduced level of divergence between chromosomes. In fact, an AMOVA performed with the sequences from X, B24 and S11 chromosomes showed zero variance between chromosomes, with all molecular variance subsumed in the within-chromosome component, and with FST, an index of population structure, not significantly different from zero (P=0.505), even after exclusion of singleton mutations (P=0.192). The homogeneity of the180-satDNA between chromosomes was clearly evident in the minimum spanning tree constructed from the complete set of different haplotypes (Figure 2), where haplotypes from different chromosomes are located in the same branches without any pattern of chromosome-specific variation. For instance, a 30-bp deletion is shared by haplotypes from the three sampled chromosomes (see Figure 2).

Figure 1
figure 1

Distribution of substitutions, measured as nucleotide diversity (π), in (a) 180-bp satellite DNA and (b) ITS sequence of 45S rDNA. Window length=10 sites.

Table 2 Nucleotide variation at the 180-bp satellite DNA
Table 3 Distribution of polymorphic sites at 180-satDNA into the different evolutionary stages defined by Strachan et al. (1985)
Figure 2
figure 2

Minimum spanning tree of 180-bp satDNA haplotypes. Haplotypes are represented by colored circles (red for X-chromosome sequences, green for B24 sequences, blue for S11 sequences). Blank circles represent haplotypes obtained from genomic DNA (0B or +B). Each black dot represents a mutational step. A full color version of this figure is available at the Heredity journal online.

The analysis of possible secondary structure for the 180-satDNA showed the highest stability pattern (with five helices) for the 180-satDNA canonical sequence from the Jete population (accession number: X75637.1; López-León et al., 1995), which was used as a reference. The secondary structures shown by the DNA sequences obtained in this study were heterogeneous, and no consistent pattern emerged when they were compared with the reference sequence. The high A+T content and the absence of open-reading frames in this satellite DNA suggest a coding non-functionality.

ITS regions and 5.8S rDNA

The amplification of 45S rDNA produced a band of 950 bp from both genomic and microdissected DNAs from the M8, S9, S10, S11, X and B chromosomes. These amplification products were independently cloned, and 83 inserts were sequenced, producing 74 different haplotypic sequences (7 for M8, 6 for S9 and S11, 5 for S10, 11 for the X chromosome, 17 for B24, 10 for 0B gDNA and 13 for +B gDNA; M8 and S9 shared one haplotype). The final alignment spanned 957 bp (see alignments in Supplementary Information). Sequences were annotated on the basis of the Chortippus paralellus sequence (AY585651), and they comprised the complete ITS1, 5.8S rDNA and ITS2 regions, as well as partial sequences of 18S and 28S rDNA as flanking regions. For the complete region, the G+C content was 57.4%, the number of variable sites was S=138 (minimum number of mutations, η=143), the nucleotide diversity π=0.0101±0.00006 and the average number of nucleotide differences k=8.8.

ITS1 showed the highest level of variation for this set of sequences, with a mean of 5.1 nucleotide differences between sequences, and an overall nucleotide diversity of 0.0156 (Table 4). In total, ITS1 showed 44 polymophic sites that defined 43 ribotypes. Only six ribotypes were found more than once, each characteristic of a specific chromosome, with the single exception of one ribotype shared by M8 and S9 chromosomes. A comparison between the six microdissected chromosomes by means of one-way ANOVA showed significant differences in G+C content (ranging from 59.8% in the X chromosome to 62.1% in the B chromosome) (F=44.74; d.f.=5, 47; P<0.0001), and Scheffe’s test showed significantly higher G+C content in the B chromosome than in the remaining chromosomes owing to four G+C-rich insertions of four to eight nucleotides. ITS1 length also differed significantly between chromosomes (F=281.44; d.f.=5, 47; P<0.0001), by being longer in chromosomes 11, X and B, and shorter in the M8, S9 and S10 chromosomes, due to several deletions, one of which comprised 17 bp (positions 533 to 549 in the alignment; see Supplementary Information). Moreover, differences between chromosomes were due not only to indels but also to specific substitutions, some of which were shared by several chromosomes. For instance, at positions 251 and 366 of the complete alignment (see Supplementary Information), sequences from chromosomes M8, S9 and S10 showed A and C, respectively, whereas the remaining sequences showed C and T. In positions 503, 505 and 514, sequences from chromosomes S11 and X showed T, C and A, respectively, whereas the remaining sequences showed C, T and G.

Table 4 Nucleotide variation at the ITS1, ITS2 and 5.8S rDNA

The average number of nucleotide differences for the 5.8S rDNA gene was only 0.7 (Table 4), implying that this region is well conserved, with only minor differences in length, due to the presence of a nucleotide insertion in some sequences (Table 5; alignment in Supplementary Information). No significant differences were found between chromosomes for length (one-way ANOVA: F=0.66; d.f.=5, 47; P=0.660) or GC content (one-way ANOVA: F=1.48; d.f.=5, 47; P=0.210) (see Table 5).

Table 5 Length, G+C content and variation in Gibbs free energy of the secondary structure of ITS and 5.8S rDNA sequences in Eyprepocnemis plorans

ITS2 also showed an intermediate level of variation, with an overall mean of 1.8 nucleotide differences between sequences (Table 4). A total of 30 variable sites were observed, making it possible to distinguish 28 ribotypes, 5 of which were found several times: the first was exclusive to the S11 chromosome and the second to the B chromosome, whereas the third was shared by chromosomes M8, S9 and X, the fourth appeared only in sequences obtained from 0B and +B gDNA and the fifth was shared by chromosomes S9 and S10. The comparison between chromosomes showed only slight differences in length (F=2.81; d.f.=5, 47; P=0.027), due to a deletion of 10 nucleotides in one sequence coming from the S11 chromosome and the insertion of a single adenine in all the DNA sequences coming from the B chromosome. G+C content, however, showed highly significant differences between chromosomes (F=12.56; d.f.=5, 47; P<0.0001) due to lower figures in the S11 chromosome (63.4% compared with the 64.8% observed in chromosomes S9 and X), because of the high G+C content of the aforementioned indel (see Table 5).

In every sequence analyzed, ITS1 folded to generate a secondary structure with six helices, the third of which was very complex. Mutations (mainly indels) observed in these sequences did not change this general pattern, as they altered only the helix length. A comparison of Gibbs free energy (ΔG), as an indicator of helix stability, showed significant differences between chromosomes (F=102.2; d.f.=5, 47; P<0.0001). The B chromosome showed the most stable secondary structure (ΔG=−184.4 kcal mol−1, s.d.=5.3), followed, in descending order, by those in the S11G=−179.9 kcal mol−1, s.d.=1.6), X (ΔG=−167.9 kcal mol−1, s.d.=2.1), S9G=−160.7 kcal mol−1, s.d.=0.8), S10G=−160.0 kcal mol−1, s.d.=0.6) and M8G=−160.0 kcal mol−1, s.d.=1.7) chromosomes. As ITS1 stability (i.e. smaller ΔG) was correlated with sequence length (r=−0.98, N=53, P<0.0001), implying that longer DNA sequences show higher stability, we generated a measure of stability per nucleotide site by dividing ΔG by sequence length. This relative value of ΔG per site was roughly similar in the B and S11 chromosomes, but it was significantly higher in the ITS1 sequences coming from the other four chromosomes (Supplementary Figure S1 in Supplementary Information). These figures indicate that the B and S11 chromosomes harbor the most stable secondary structures for the ITS region, even after correction for sequence length. The DNA sequences obtained from the B chromosome were the longest and, in relation to them, the ITS1 sequences from the remaining chromosomes showed a variety of deletions (see Table 5), affecting mainly the apex of helices II, IV and VI (the latter being affected only in the M8, M9 and S10 chromosomes), provoking a decrease in length with a priori no apparent influence on rRNA processing.

The secondary structure of the ITS2 region consisted of three helices, with most of the 15 variable sites observed located in helix III. The ΔG necessary for secondary structure folding ranged from −85.1 kcal mol−1 (s.d.=4.1) in the S11 chromosome to −93.3 kcal mol−1 (s.d.=0.8) in the X chromosome (Table 5). A comparison between chromosomes revealed significant differences in stability (F=14.01; d.f.=5, 47; P<0.0001). With only one exception, all the sequences showed an ITS2 that was putatively functional, thus showing the hallmarks proposed by Coleman (2007) as characteristic of eukaryotes: a U–U mismatch near the base of helix II, and helix III as the longest helix, with a conserved CGGU motif in the 5′ strand. This secondary structure lacked helix IIb, a characteristic also found in the ITS2 of other Eyprepocnemidinae species (Ruiz-Ruano et al., in preparation). The one exception was an ITS2 sequence from the S11 chromosome, which had a loop in helix III that decreased its stability (ΔG=−76.6 kcal mol−1). This value indicates that this sequence will probably exhibit non-functional behavior.

Homogenization of ITS sequences in E. plorans was higher at the intra- than the interchromosome level, as revealed by the analysis of molecular variance. When DNA sequences obtained from particular chromosomes (i.e. M8, S9, S10, S11, X and B24) were analyzed using AMOVA, 82.46% of the molecular variation was found between chromosomes (82.3% without considering singletons) and only 17.54% was found within chromosomes. In fact, FST was 0.82 (P<0.0001), implying the existence of a high level of genetic structure in chromosomes for these paralogous DNA sequences, even when singletons were discarded. This intragenomic structure was also displayed in the minimum spanning tree built with unique haplotypes (Figure 3), where four clusters of DNA sequences were clearly separated: one containing those from the B24, another with the X sequences, another with S11 sequences and a final one containing M8, S9 and S10 sequences.

Figure 3
figure 3

Minimum spanning tree of rDNA (ITS1+5.8S+ITS2) haplotypes. Haplotypes are represented by colored circles (red for X-chromosome sequences, green for B-chromosome sequences and different shades of blue for M8, S9, S10 and S11 sequences). Blank circles represent haplotypes obtained from genomic DNA (0B or +B). Each black dot represents a mutational step. Four haplotype groups are clearly differentiated, corresponding to chromosomes S11, X, B24, and a final group that includes chromosomes for M8, S9 and S10, with the exception of a sequence from the B24 chromosome grouped with the sequences from the X chromosome. For each group, we represented the secondary structure of the most stable sequence for ITS1 (i.e. with the lowest Gibbs free energy). The secondary structure of ITS2 showed no notable differences between these groups, but we show the secondary structure of the most stable ITS2 sequence from the B24 chromosome to indicate (with an arrow) the position of an additional characteristic adenine present only in B24 sequences. We indicate the number of each helix with Roman numerals. A full color version of this figure is available at the Heredity journal online.

Discussion

Several evolutionary processes have an important role in concerted evolution (Nei and Rooney, 2005). Mutation increases variations between repeat units, whereas this is reduced by both molecular drive and selection. Molecular drive occurs mainly through unequal crossover and gene conversion, with the former changing the copy number and the latter failing to do so (Eickbush and Eickbush, 2007). In yeast, Petes (1980) showed that unequal crossover most frequently causes sister chromatid homogenization, thus increasing sequence differences between non-homologous chromosomes. Gene conversion, however, is potentially able to homogenize all levels (sister, homologous and non-homologous chromatids/chromosomes) and, in this respect, it resembles the result of purifying selection acting on paralogous copies. The dynamics of repetitive DNA sequences at the genome level are thus similar to those of genetic markers in populations, with molecular drive playing the same role as genetic drift at the population level. The processes increasing variability between chromosomes (mutation and intrachromosome molecular drive) are thus similar to those increasing the variability between populations (mutation and drift). Similarly, migration and some kind of selection decrease the variability between populations, while purifying selection and transposition reduce the divergence between chromosomes within a genome, with transposition playing the same role as gene flow in a metapopulation. The joint action of these processes will result in a particular degree of sequence homogenization within and between chromosomes.

To quantify the degree of sequence homogenization in both 180-satDNA and rDNA, we used AMOVA (Excoffier et al., 1992) to partition molecular variance at two hierarchical levels, within chromosomes and between chromosomes (i.e. at homologous and non-homologous levels). Using this statistical approach, we considered each chromosome pair as a population belonging to a meta-population system (the entire genome). Therefore, low FST values will indicate either recent colonization of the genome or else a high gene flow between chromosomes (as is conceivable for mobile elements), whereas high FST values will suggest a very low gene flow between chromosomes and/or the existence of homogenization processes more efficient within chromosomes than between them. In E. plorans, we found that the high sequence variation observed for 180-satDNA is similar in the three chromosomes analyzed, but the ITS sequences present a distinctive structure, with most variance found between chromosomes. This is clearly shown by the FST statistic (rDNA FST=0.82; 180-satDNA FST=0) and graphically illustrated by the minimum spanning trees that show a complete mixture of sequences from different chromosomes in the same clusters for 180-satDNA (Figure 2) but a grouping of ITS sequences according to the chromosome of origin (Figure 3). These disparate patterns of homogenization are puzzling, considering the adjacent localization of both repetitive DNAs on paracentromeric regions in most chromosomes, suggesting that concerted evolution in E. plorans might not work as a genome-wide process but as a sequence-specific one.

It is unlikely that a single mechanism could produce the disparate intragenomic structure for these two repetitive DNAs. The extreme between-chromosome structure of the rDNA loci found in E. plorans implies that unequal sister crossovers might also be responsible for the within-chromosome homogenization. For instance, rDNA units show a much higher degree of homogenization in humans than in mice because of greater interchromosomal exchange between non-homologous chromosomes (for a review, see Eickbush and Eickbush, 2007). The different chromosome location of rDNA in these species could be the basis for this difference. In humans, rDNA is adjacent to the short-arm telomere (Gonzalez and Sylvester, 2001), but it is proximal to the centromere in the long arm of acrocentric chromosomes in mice (Arnheim et al., 1982). Thus, non-homologous exchanges will transfer the whole long arm in mice (with severe consequences) but only rDNA and the telomere in humans (with no negative effects). In E. plorans, as in mice, rDNA shows paracentromeric location in the long arm of acrocentric chromosomes, and this could explain the observed bias toward high homologous (within-chromosome) but low non-homologous (between-chromosome) homogenization.

ITS and satDNAs are non-coding, but ITS are important in the processing of rRNA sequences to produce functional rRNA (Peculis and Greer, 1998; Coleman, 2007), implying that ITS could be subjected to some degree of purifying selection (Nei and Rooney, 2005). The high sequence conservation of the 5.8S genes indicates that the ITS sequences analyzed here are most probably adjacent to non-pseudogenized rRNA genes, but not necessarily active ones. The presence of attached nucleoli in all these chromosomes in diplotene cells (Bakkali et al., 2001) also indicates that these genes are active. Furthermore, we found little evidence of pseudogenization in the rDNA haplotypes, as only one of the 83 ITS sequences analyzed appeared as pseudogenic. Purifying selection could thus participate in the maintenance of rDNA sequence homogenization, as proposed by the birth-and-death model (Nei et al., 1997; Nei and Rooney, 2005).

Like other non-coding RNA, some satDNAs are transcribed and involved in heterochromatin formation and gene expression, for instance, the major pericentric satellite repeats in mouse (Probst et al., 2010). However, the 180-satDNA has no known functionality, although some 180-satDNA transcripts are found in cDNA (data not shown) and the substitutions did not appear homogeneously distributed throughout the sequence (Figure 1). The higher number of mutations found in the satDNA (a 10-fold increase in nucleotide diversity with respect to ITS) precludes intense purifying selection. It is not known whether this 180-satDNA nucleotide diversity results from an increased mutation rate or by reduced reparation. However, the rDNA could be subjected to a more exhaustive mutation surveillance than the 180-satDNA owing to the transcription-coupled repair of active ribosomal RNA genes linked to RNA polymerase I (Iben et al., 2002).

Although some repetitive non-coding sequences showed sequence homogeneity, much of this may reflect recent amplification and transposition, rather than active sequence homogenization (Liao, 1999). One difference between rDNA and 180-satDNA is age. Whereas rDNA is as old as the E. plorans ancestors, the 180-satDNA is exclusive to this species as it has not been found in congeneric species such as E. unicolor, or other species from the Heteracris, Thisoicetrinus and Shirakiacris genera closely related to Eyprepocnemis (Cabrero et al., 2003a). The absence of 180-satDNA between-chromosome structure could be a by-product of its recent origin and evidence of their fast colonization of the E. plorans genome. In fact, variation between populations has been reported for A chromosomes carrying this 180-satDNA (Cabrero et al., 2003b), suggesting fast spatiotemporal dynamics for this sequence. The analysis of the 180-satDNA sequences with the approach suggested by Strachan et al. (1985) also points to 180-satDNA being young, as most of these sequences belong to polymorphic types I and II (see Table 3). Alternatively, a bouquet configuration during early first meiotic prophase determines the close proximity within the pericentromeric 180-satDNA blocks of different non-homologous chromosomes, thus allowing unequal crossover, transfer of 180-satDNA between chromosomes and, therefore, non-homologous homogenization.

In E. plorans plorans, rDNA also shows variations between populations: it is present in only two A chromosomes (S9 and S11) in Daghestan, northern Caucasus (coinciding with a southern African population of E. plorans meridionalis); two or three chromosomes in Armenia; four in Greece; five in Turkey; 10 in Spain; and all 12 chromosomes in Morocco, with a clear East–West cline in the number of rDNA-carrying chromosomes (López-León et al., 2008). In these same populations, the 180-satDNA showed only slight variations in the number of chromosomes carrying it, due to its presence in most chromosomes (Cabrero et al., 2003b). Therefore, rDNA has recently undergone a marked spread among non-homologous chromosomes in Western populations of this species. On this basis, we should expect high homogeneity for ITS sequences in non-homologous chromosomes, that is, just the opposite of the observed pattern. This could also be explained by a highly efficient sister chromatid homogenization in E. plorans, with the consequent increase in between-chromosome differences.

The spread of rDNA from S9 and S11 chromosomes toward the remaining chromosomes in the E. plorans genome (see above) is also evident in our ITS1 sequence analysis. As shown by the minimum spanning tree in Figure 3, the sequences in the B and X chromosomes are most related to those in the S11 chromosome, implying that the S11 rDNA could have colonized the X and B chromosomes. By contrast, the ITS sequences found in chromosomes M8 and S10 are most similar to those in chromosome S9, suggesting that the rDNA in these two chromosomes could be derived from that in chromosome S9. This is a noteworthy prospect as it indicates how paralogous DNA sequences are able to move between non-homologous chromosomes within a genome, thus helping to rebuild the history of this genome. In this respect, the kind of analysis presented here provides valuable information on genomic geography and history, thus offering an approach to intragenomic phylogeography.

The high intragenomic variation and the remarkable interchromosome divergence found for the ITS sequences in E. plorans suggests that the ITS regions should be used with extreme caution as phylogenetic markers if this phenomenon is widespread. ITS intragenomic variation has also been reported in other species of grasshoppers, such as C. parallelus (Parkin and Butlin, 2004) and Podisma pedestris (Keller et al., 2006), and in other groups of organisms, such as nematodes (Hugall et al., 1999), crabs (Harris and Crandall, 2000), crustaceans (Gandolfi et al., 2001), sponges (Wörheide et al., 2004), mosquitoes (Alquezar et al., 2010), tapeworms (Orosová et al., 2010) and plants (Nieto-Felliner, Roselló, 2007; Song et al., 2012). Such a high intragenomic variation between paralogous copies of rDNA units could lead to highly divergent phylogenetic results, depending on the specific copy or copies sampled. It would be advisable not only to exercise extra caution when using ITS as genetic markers, as Simon and Weiß (2008) suggested after finding high intragenomic variation for rRNA genes in four fungus species, but also to undertake an exhaustive analysis of intragenomic variation before using any repetitive DNA for phylogenetic analysis.

Our results also provide some insights into the possible origin of B chromosomes in E. plorans. First of all, the minimum spanning tree in Figure 3 shows that the ITS sequences obtained from the B chromosome are most similar to those obtained from the S11 chromosome, that is, the smallest autosome; this similarity is greater than that with the DNA sequences obtained from the X chromosome. This overturns a previous hypothesis suggesting a B origin from the X chromosome (López-León et al., 1994) and opens up the possibility that the B chromosome derived from the S11 chromosome. Further evidence is provided by the ITS1 length and ΔG, which showed the greatest similarity between the sequences obtained from the B and the S11 chromosomes, as well as by the nucleotide diversity in the B chromosome 180-satDNA, which was similar to that of the S11 chromosome but significantly different from that of the X chromosome. A greater similarity with the original A chromosome that gave birth to the B chromosome is consistent with the recent origin suggested by Muñoz-Pajares et al. (2011).

Caution is required when ascertaining nucleotide polymorphism from sequences obtained after PCR amplification because Taq DNA polymerase could introduce mutations at a rate close to 1 in 9000 nucleotides (Tindall and Kunkel, 1988). First, to reduce the nucleotide calling errors, we routinely double-sequenced the vector inserts. Second, we analyzed the entire set of sequences after singletons were replaced by consensus nucleotide (see Supplementary Information). These analyses are conservative because they eliminate both mutations introduced as technical artifacts and actual single-nucleotide polymorphisms. The main results were not altered by singleton elimination, and the sequence variation patterns remained standing, albeit with a minor polymorphism and sequence variation.

The present results show different patterns of homogenization for rDNA and 180-satDNA, with a high between-chromosome structure and lower nucleotide diversity for the former but no structure for the latter. These differences are puzzling, considering the adjacent localization of both repetitive DNAs on paracentromeric regions in most chromosomes, suggesting that concerted evolution does not work as a genome-wide process in E. plorans but rather as a sequence-specific process. The combination of a highly efficient sister-chromatid homogenization, a possible reduced mutation rate due to the transcription-coupled repairing of active ribosomal RNA genes, and the stronger purifying selection acting in rDNA, together with the recent origin of the 180-satDNA, could help to explain the disparate homogenization patterns observed for these two repetitive DNA sequences.

Data archiving

All sequences were submitted to GenBank. Supplementary Table S1 presents the accession number for each nucleotide sequence.