Introduction

Microsatellites or tandem simple sequence repeats consist of highly variable repeated units of 2–6 bp. They have been widely and routinely used as molecular markers in diverse genetic studies (Selkoe and Toonen, 2006). Despite their usefulness, microsatellite molecular markers in certain taxa can sometimes be problematic to develop (Meglécz et al., 2004; Arthofer et al., 2007; Bailie et al., 2010). This is particularly true for molluscs, and the underlying causes of such observations have never been fully investigated or explained (Reece et al., 2004; Weetman et al., 2001, 2005). Preliminary findings from other taxa indicate that difficulties encountered with microsatellite development generally are not due to a paucity of microsatellites or a lack of polymorphic vs monomorphic microsatellites in the genome (Van’t Hof et al., 2007). However, other studies rather suggest that methodological difficulties, at least in insects and crustaceans, may have been caused by genomic complexities contained within microsatellite flanking regions (Meglécz et al., 2004; Van’t Hof et al., 2007; Bailie et al., 2010).

The two major genomic features thought to be responsible for PCR interference or inconsistencies are the following: (1) unstable flanking sequences and (2) occurrences of cryptic repetitive DNA (Meglécz et al., 2004). Thus, microsatellite flanking regions are either too similar or too variable in these taxa. Unstable flanking regions arise when indels or mutations occur at PCR primer binding sites, thereby causing PCR failure, commonly referred to as null alleles (Dakin and Avise, 2004). These can lead to underestimates in allele frequencies and heterozygosity. Consequently, demographic and biological inferences (for example, population size estimates, parentage) made from data sets affected by null alleles (without accounting for their bias) can be compromised (see Bonin et al. (2004) and DeWoody et al. (2006) for review). Second, the occurrence of cryptic repetitive DNA in microsatellite flanking regions overlapping with PCR binding sites can result in the amplification of products of unexpected sizes or in the difficulty in amplification of products representing a single locus (Zhang, 2004). Cryptic repetitive DNA and high similarity among microsatellite flanking regions, which are very commonly found in plants (Tero et al., 2006), have also been recently identified from a number of insects and crustaceans (Meglécz et al., 2004, 2007; Van’t Hof et al., 2007; Bailie et al., 2010). These are thought to be primarily generated during DNA multiplication or duplication processes (Meglécz et al., 2004). Recombination-related events such as unequal crossing over and gene conversion may also be responsible. Evidence of genomic rearrangement processes can be inferred from proxies within microsatellite flanking regions (Meglécz et al., 2004; Van’t Hof et al., 2007). For instance, although information is still limited, microsatellites can be found in close association with transposable elements (TEs) or as an integral component of the TE itself (Ramsay et al., 1999; Gaffney et al., 2003; Carreras-Carbonell et al., 2006). On the basis of these observations, Meglécz et al. (2004) suggest that genomic rearrangement processes could be mediated by TEs. TEs represent a class of repetitive DNA segments that can be extremely abundant and account for a large portion of a genome; for example, they comprise some 45% of the human genome and up to 80% of some grass genomes (see Feschotte et al. (2002a) for review). TEs can be classified as either autonomous or non-autonomous elements, both are mobile and can make duplicate copies of themselves, which are inserted into new genome locations. TE classes fundamentally differ in their transposition mechanism. Although autonomous TEs have the coding capacity for the production of all the enzymes required for their transposition, non-autonomous TEs do not and instead ‘hijack’ the machinery of partner TEs to accomplish their transposition (Feschotte et al., 2002a, 2002b). It is still not known, however, whether autonomous or non-autonomous TEs contribute equally to genomic rearrangement processes.

A number of microsatellite marker systems were developed by the authors for three gastropod molluscs, Littorina saxatilis (Olivi, 1792), L. littorea (Linnaeus, 1758) (McInerney et al., 2009a, 2009b) and Gibbula cineraria (Linnaeus, 1758) (this study). In each case, identical protocols for microsatellite development based on enriched genomic libraries were implemented. Despite similar cloning efficiencies, success rates for viable marker development varied considerably among the species. To try to elucidate the reasons for such variation, we carried out a comparative genomic analysis of microsatellite containing sequences (MCS) from three gastropod molluscs in this study. More specifically, we tested the hypothesis that inter-specific variation in genomic complexity and stability of microsatellite flanking regions was responsible for the varied success rates observed.

Materials and methods

Microsatellite development, sequence data and PCR settings

The development and isolation of microsatellite markers for G. cineraria followed identical protocols used for both L. saxatilis and L. littorea as described by McInerney et al. (2009a, 2009b). All MCS isolated from the three independent enriched genomic libraries were edited to remove any vector contamination as identified with the UniVec vector base of NCBI (ftp://ftp.ncbi.nih.gov/pub/UniVec). Duplicate or redundant sequences (>98% identity) were identified using BLASTn (http://blast.ncbi.nlm.nih.gov/Blast.cgi) (Altschul et al., 1997) and excluded from further analyses. Microsatellite PCR primer sets were designed with PrimerSelect (DNASTAR, Inc., Madison, WI, USA) only from clones displaying suitable flanking regions. PCRs were performed with DNA template representing three to six individuals per species, isolated according to McInerney et al. (2009a). PCR primer sequences for markers amplifying single locus, cycling conditions and gastropod species tested are provided in Table 1. PCRs were undertaken in 12 μl reaction volumes containing 50 ng DNA, 100 μM dNTPs, 1 × PCR reaction buffer (Invitrogen Ltd, Paisley, UK), 0.5 U Taq DNA polymerase (Invitrogen), primer and MgCl2 concentrations as in Table 1. Fluorescently labelled PCR products were separated on a LI-COR 4200 automated system (LI-COR Inc., Lincoln, NE, USA). For further details of experimental conditions used, see McInerney et al. (2009a). Samples representing seven additional species were also used in PCR experiments to assess marker cross-species hybridization. These comprised the co-occurring intertidal gastropod molluscs of the family Littorinidae: L. fabalis (Turton, 1825), L. obtusata (Linnaeus, 1758), L. compressa (Jeffreys, 1865); and the family Trochidae: G. umbilicalis (da Costa, 1778), G. magus (Linnaeus, 1758), Calliostoma zizyphinum (Linnaeus, 1758) and Osilinus lineatus (da Costa, 1778).

Table 1 PCR primer sets and amplification conditions of loci used in Experiments I (only primer sets that amplified fragments representing single locus polymorphism are shown) and II

Inter-specific analysis of cryptic repetitive DNA and DNA families

To identify the possible existence of cryptic repetitive DNA among MCS flanking regions, we compared these sequences in an all-against-all BLASTn analysis (Altschul et al., 1997), with the option to mask for low-complexity repeat sequences following the approach of Meglécz et al. (2004). BLASTn alignments were initially screened by eye to ensure the exclusion of false alignments (relating to the tandem repeat regions) that had escaped the filtering process. Results were tabulated and limited to hits involving sequences larger than 40 bp in length with a >85% identity, occurring in the flanking regions as suggested by Meglécz et al. (2004). Multiple MCS containing similar cryptic repetitive DNA in the microsatellite flanking regions (as identified through BLASTn) were subsequently grouped into DNA families. For the purposes of this paper, we define DNA families as a group of MCS that have highly similar microsatellite flanking regions containing cryptic repetitive DNA. The proportion of grouped sequences into DNA families was calculated as grouped sequences over the sum of the total number of sequences examined per species.

Inter-specific analysis of TEs

To check for the possible presence of known TEs, we scanned all MCS against Repbase, a database of known TEs (Kohany et al., 2006) (http://www.girinst.org/censor/index.php) using default sensitivity parameters and the option to mask for low-complexity repeat sequences (that is, microsatellites). Hits below a similarity cut-off threshold of 0.65 were not considered, as these often showed little or no continuous similarity between the TE and microsatellite flanking regions. Treatment of raw results followed a parallel approach to the BLASTn analysis. Thus, Repbase hits were initially screened by eye to ensure the exclusion of false alignments (relating to the tandem repeat regions) that had escaped the filtering process. The proportion of sequences identified with TEs was calculated as sequences with TEs over the sum of the total number of sequences examined per species.

Results

Isolation of repetitive DNA and microsatellite characterization

A total of 180 MCS with a mean length of 260 bp, amounting to 47.2 kb, were isolated from three gastropod mollusc genomic libraries and deposited in GenBank (for details see Table 2). Of these, 17 have been already published as part of previous work (McInerney et al., 2009a, 2009b). Cloning efficiency (average 12.1%) and the numbers of unique MCS sequenced (N=58, 61 and 61 for L. saxatilis, L. littorea and G. cineraria, respectively) were similar for each of the genomic libraries. Despite this, PCR primer development success rate varied markedly among the species. Greater difficulty was encountered for marker development in G. cineraria in comparison to the two Littorina species. Despite a large number of G. cineraria PCR primer sets (N=150) developed on presumably unique microsatellite flanking regions, in a first instance no single locus amplification was achieved. In all cases, either multi-banding patterns or no amplification was the outcome.

Table 2 Summary of MCS examined in this study and the proportions (%) of MCS that grouped into DNA families and had significant homology to a known TE

Inter-specific variation in microsatellite flanking sequence similarities

Numerous microsatellite flanking sequences contained cryptic repetitive DNA, the majority (83.3%) of these displayed a high level of intra-specific similarities (that is, similar to other cryptic repetitive DNA in the same species). A small (16.7%) number of inter-specific similarities were also identified between the littorinids. Of the 180 sequences examined, a total of 61 (33.9%) were grouped into 12 DNA families with the BLASTn analysis. A summary of the DNA families identified is provided in Table 3. Each of these comprised between 2 and 24 MCS. Five families comprised L. littorea and L. saxatilis MCS (families 8–12), each sharing a region within their flanking sequence of between ca. 40 and 80 bp. The remaining seven G. cineraria families (families 1–7) shared larger regions of similarity ranging between ca. 40 and 130 bp. The proportion of MCS per species that grouped into DNA families was far greater in G. cineraria (74.6%), compared to L. littorea (18%) and L. saxatilis (9.5%) (Table 2). A schematic representation of some of the DNA families identified with the all-against-all BLASTn analysis is presented in Figure 1. The cryptic repetitive DNA from the microsatellite flanking regions could be arranged both in symmetrical and asymmetrical orientations; that is, identical on both microsatellite flanking sequences or identical on one side only. Microsatellite repeats identified from the characterized DNA families were often imperfect. The two main motifs identified from the DNA families were GACA and CCAT with other repeat motifs occurring to a lesser extent (for example, GAA and GACG). These represent a fraction of the repeat motif types used in the enrichment procedure (McInerney et al., 2009a, 2009b).

Table 3 Summary of the DNA families identified with the BLASTn analysis and MCS with known TEs identified from their flanking regions (bold type, for details see Table 4)
Figure 1
figure 1

Schematic representation of DNA families identified with an all-against-all BLASTn analysis. MCS names are provided with GenBank accession numbers in parenthesis. Black boxes, microsatellite regions and number of repeat motif; grey boxes, cryptic repetitive DNA identified in flanking regions; black lines, unique DNA sequences (that is, no apparent similarity to any other regions). In families (Fam) 6 and 7, striped boxes indicate flanking regions of cryptic repetitive DNA identified as TEs (for details see Table 4). For simplicity, some sequences have been omitted from the schematic of DNA families 6 and 7.

It is clear that the presence of cryptic repetitive DNA in microsatellite flanking regions and the subsequent existence of DNA families with members scattered throughout the genome can hamper the development of PCR primer sets for single locus amplification. In an attempt to overcome this difficulty (that is, to amplify fragments representing single locus), we redesigned PCR primer sets (for primer sets and PCR conditions see Table 1 and Supplementary Table S1) in a way that their 3′ ends targeted unique nucleotide differences identifiable among DNA family members (Experiment I). Among the members belonging to the 12 identified DNA families, a total of 25 new PCR primer sets were tested for single locus amplification. Of these, only six primer sets amplified single microsatellite loci with a clear di-allelic pattern; four in G. cineraria (Gcin1, Gcin2, Gcin3, Gcin4) and one each for L. littorea and L. saxatilis (Lsax12, Llit52) (Table 1). In cross-species hybridization tests all redesigned microsatellite loci failed to amplify, except the locus Lsax12 (see McInerney et al. (2009b) for further details).

To facilitate future studies, we annotated the flanking region sequences characterizing the distinct DNA families (that is, cryptic repetitive DNA) that were not readily identified as TEs from Repbase scans with the identifier gastropod core sequences (GCS 1, 2 and so on). These core sequences are provided in Supplementary Table S2. Homologies between the GCS and sequences submitted to the GenBank database were surveyed in a BLASTn analysis. BLAST hits were limited to those with an E-value <0.025 and a BLAST score >40 (Van’t Hof et al., 2007). No matches were observed for over half of the sequences tested suggesting they are novel. Five of the GCS (1, 4, 5, 10 and 11) produced matches with MCS of other molluscs albeit distantly related. These included bivalves and pelecypods in addition to other taxa (Supplementary Table S3).

Inter-specific variation in the occurrence of TEs

Identified TEs were classified following the universal classification system of Kapitonov and Jurka (2008) implemented in Repbase. Information regarding TE transpositional mechanism (autonomous/non-autonomous) was obtained from Repbase and Web of Science reports. TEs were found in 20.6% (N=37) of all the MCS examined (Tables 2 and 4). On average, regions displaying a high identity to a TE (ca.78%) were 74 bp in length. In most instances (89.2%), a single TE was identified from an MCS. In three instances, however, two different TEs were observed in the same MCS (Llit64, Llit66, Gcin26), and in one instance only, the same region of a TE was observed twice in the same MCS (Gcin20). Thus, the cryptic repetitive DNA identified from the DNA families, after Repbase scans, was sometimes shown to be composed of more than one TE (Table 4, Figure 1 (family 7)). L. saxatilis had the least proportion (3.2%, N=2) of MCS associated with TEs (Table 2). These were identified as MuDR (MULE) and Mu-like DNA transposons. L. littorea had a larger proportion of MCS with TEs (19.7%) and overall the highest number of different TEs (N=11), the majority of which were identified only once among MCS for this species. These included a variety of TEs divided in almost equal proportions between autonomous and non-autonomous as follows: DNA transposons (En/Spm (CACTA), Mariner, hAT, Arnold, MuDR (MULE)); LTR retrotransposon (Gypsy) and non-LTR retrotransposons (LINE and SINEs). Although G. cineraria had the greatest proportion of MCS with TEs (45.8%), these were represented by only six different types, all non-autonomous TEs, some of which were observed at high frequencies (Table 4). Among these were DNA transposons (MITE, helitrons) and LTR retrotransposons (endogenous retroviruses ERV1, ERV3).

Table 4 Summary of TEs identified in the scan of gastropod MCS against Repbase

The most frequently identified TE in G. cineraria was the miniature inverted-repeat transposable element (MITE), CvA (Gaffney et al., 2003). CvA is part of a family of TEs known as pearl that was initially described in the oyster, Crassostrea virginica (Gaffney et al., 2003). A total of 19 copies of CvA were detected in 14.7 kb MCS from G. cineraria (1.29 copies per kb). Hits for CvA were on average 66 bp (range, 24–70 bp) in alignment length with varying degrees of homology (66–100%), and they all occurred between nucleotide positions 297–430 bp of the published sequence (600 bp). This region overlaps with the conserved terminal sequence and contains a proto-microsatellite (GACA)n and an RNA polymerase III promoter BoxA. To assess the relative abundance of CvA in molluscan genomes, we designed a PCR primer set (Table 1) to amplify the conserved terminal sequence of CvA in host genomic DNA and in the genomic DNA of phylogenetically close related species (Experiment II). Reliable amplification products were observed both from host genomic DNA (G. cineraria) and from the trochids G. umbilicalis, G. magus and O. lineatus. The presence of multiple amplified PCR fragments suggests that CvA are both abundant and phylogenetically conserved in close relatives from the family Trochidae. They are, however, absent in the genomes of C. zizyphinum and in the other littorinid species tested. The second most common TE identified from G. cineraria was the helitron DNAREP1_DYak (N=4). This TE is 793 bp in length and was initially identified from Drosophila yakuba (Kapitonov and Jurka, 2007). It is a deletion derivative of the autonomous Helitron-1_DYak, which is usually inserted in the ttwTTT target sites without the target site duplications (Kapitonov and Jurka, 2007).

Interestingly, although several different classes of TEs were identified from the MCS of six of the DNA families, in each case they consisted of non-autonomous TEs (families 3, 4, 5, 6, 7, 12). Specifically these included the DNA transposons CvA, the helitrons DNAREP1_DYak and Helitron-N1_SP, the LTR retrotransposon endogenous retroviruses MonoRep289C and ERV46_MD_I, and the Non-LTR retrotransposon SINE2-1_SP. These were predominantly found in association with the microsatellite repeat motifs CCAT.

Discussion

Inter-specific variation in cryptic repetitive DNA and DNA family abundance

Genomic complexities such as cryptic repetitive DNA and DNA family abundance identified in association with microsatellites have been commonly reported from plants, insects and crustaceans (Tero et al., 2006; Meglécz et al., 2004, 2007; Van’t Hof et al., 2007; Bailie et al., 2010). This is the first study, however, to describe their occurrence and frequency in the largest class of molluscs, the gastropods. Previous reports of multiple-banding patterns observed during microsatellite development in other molluscs (Reece et al., 2004; Weetman et al., 2005) seem to suggest that this genomic idiosyncrasy may be far more widespread. Non-reporting of ‘negative’ results in the published literature is likely to be responsible for their underestimation. Among distantly related molluscs, similar associations involving other classes of repetitive DNA have also been suggested in the scallop Pecten maximus, oyster C. virginica and clam Anadara trapezia (Gaffney et al., 2003; Biscotti et al., 2007).

In comparable studies that similarly attempted to quantify the frequency of cryptic repetitive DNA associated with MCS, the highest proportion of MCS that grouped into DNA families was just 55% (Meglécz et al., 2007). Thus, to the best of our knowledge, G. cineraria appears to have the greatest abundance of DNA families (74.6%), so far reported from invertebrates. The majority of cryptic repetitive DNA identified in this study, which grouped into distinct DNA families, was restricted to single species. This is congruent with reports from other studies involving insects and crustaceans (Meglécz et al., 2004, 2007; Van’t Hof et al., 2007; Bailie et al., 2010). The occurrence of cryptic repetitive DNA grouped into DNA families can present a significant problem for the development of microsatellite markers. In this study, we described a possible solution to facilitate microsatellite marker development from genomic regions harbouring DNA families. This approach capitalized upon the existence of discrete nucleotide differences between DNA family members and could be implemented also for other taxa (molluscs, insects) where marker development has been otherwise unsuccessful.

The provision of the GCS database for the conserved mollusc cryptic repetitive DNA should prove useful for future studies. Shared similarities between GCS and MCS from bivalve and pelecypods suggest that the GCS are phylogenetically conserved among very distantly related molluscs. Interestingly, the majority of the cryptic repetitive DNA identified in this study were novel, thus suggesting that gastropod mollusc genomes harbour many as yet uncharacterized genomic elements.

TE abundance and DNA multiplication

TEs have been previously reported for the distantly related mollusc classes Pelecypoda and Bivalvia also in association with tandem repeated regions (Gaffney et al., 2003; Biscotti et al., 2007). This, however, is the first study to report the widespread occurrence of TEs in the largest mollusc class, the Gastropoda. The quantification of TEs in this study, however, is most likely an underestimate. This is because the identification of TEs is heavily reliant upon database entries, for which characterized molluscan TEs are still lacking. Nonetheless, we report greater abundance of TEs from microsatellite flanking regions, compared with all other previous similar studies (Meglécz et al., 2004; Van’t Hof et al., 2007).

The comparative genomic analysis revealed that the three gastropods differed quite considerably in their TE abundance. The larger number of TEs in G. cineraria could explain a possible higher rate of genomic rearrangement processes occurring in this species, as a result of TE transpositional activity. In congruence with this hypothesis, we observed that G. cineraria had the highest amount of genomic complexities (cryptic repetitive DNA and DNA families) and increasingly unstable microsatellite flanking regions. Thus, it is reasonable to assume a possible link between DNA family frequencies and TEs in gastropods. This provides additional supporting evidence from a separate phylum, for the hypothesis that the creation of DNA families is mediated by TEs as suggested by Meglécz et al. (2004). Overall frequency of TEs and hence differential rates of DNA multiplication processes may not have been the only important factor to explain inter-specific variation in genome stability and marker development success rates.

Closer examination of the results revealed that the gastropods also differed significantly in their complement of TEs and the proportion of TEs that were identified as autonomous vs non-autonomous classes. One possibility is that dissimilar transpositional mechanisms differentiate the TE classes in terms of their propensity for fixation in the genome. As molluscs undergo high substitution rates (Davison, 2002), mutational changes in coding sequences necessary for transposition would lead to greater propensity for fixation of autonomous TEs. Conversely, non-autonomous TEs that ‘hijack’ the machinery of partner TEs to accomplish their transposition (Feschotte et al., 2002a, 2002b) would probably be unaffected. Furthermore, as non-autonomous TEs continue to proliferate, their transposition would result in higher frequencies of mutational changes (Jiang et al., 2003; Nakazaki et al., 2003). This would further accelerate the process of fixation of autonomous TEs. Thus, as the combined result of these processes, compared to autonomous TEs, non-autonomous TEs may be more highly conserved and have higher transpositional activities. In this study, a number of different lines of evidence seem to support this new hypothesis.

First and foremost was the identification of a variety of exclusively non-autonomous TEs from the flanking regions of the DNA families. Second, the high number of non-autonomous TEs with highly conserved regions (for example, MITE, CvA) provides corroborating evidence that non-autonomous TEs in the G. cineraria genome have undergone recent transpositional activity (Ray, 2007; Kass et al., 2009). Thus, this indicates a lack of propensity for fixation for this TE class in the latter species. Non-autonomous TEs such as MITEs usually attain high copy numbers in the genome of many taxa (Feschotte et al., 2002b; Ray et al., 2005). Nonetheless it was interesting to note that MCS of G. cineraria contained a 10-fold greater copy number of CvA (1.29 copies per kb) compared to genomic DNA of C. virginica (0.19 copies per kb; Gaffney et al., 2003). A possible hypothesis is that non-autonomous TEs, such as CvA, tend to accumulate in genomic regions that are predominantly neutral, such as microsatellite regions. Remarkably, the identified regions, without exception, corresponded to an RNA polymerase III promoter BoxA. This promoter is involved in the transcription of TEs before the recruitment of enzymes from other TEs involved in their mobilization (Ray, 2007). Our results indicate that compared to autonomous TEs, non-autonomous TEs maybe have a more active role in the rearrangement processes occurring in mollusc (and possibly other) genomes. This mechanism may have been an important factor in determining the inter-specific variation observed among the three gastropods.

Alternatively, autonomous vs non-autonomous TEs may differ in their ability to successfully evade TE transcription silencing systems. These systems hamper the transcription and subsequent transposition of TEs through a process that involves RNA interference and sometimes DNA methylation (see Weil and Martienssen (2008) for a review). Interestingly, Weil and Martienssen (2008) suggest that TEs that are abundant, for instance non-autonomous MITEs, have successfully evaded TE transcription silencing systems. Although the authors have not suggested a direct link with non-autonomous TE transpositional mechanisms, we cannot rule out this hypothesis. The identification of active TE-transposase systems in other non-autonomous TEs, such as the mPing/Pong system in rice (see Feschotte and Pritham (2007) for review) should allow for the experimental investigation of these hypotheses in future investigations.

TE abundance and DNA recombination

Differential rates of DNA recombination processes may have also been an important factor to explain inter-specific variation in genome stability and marker development success rates.

In this investigation, we present additional evidence to support this by gene conversion and unequal crossing over. Gene conversion is the non-reciprocal exchange of genetic material among chromosomes and can be mediated by TE transposition (Meglécz et al., 2004). Specifically the non-autonomous TEs helitrons can capture and move gene fragments and are responsible for gene duplication and conversion (Morgante et al., 2005; Hollister and Gaut, 2007). This process can even lead to the formation of novel genes (Lockton and Gaut, 2009). Although Helitrons were not identified from the littorinids, the helitron DNAREP1_DYak was the second most highly abundant TE identified in G. cineraria. This TE is related to DNAREP1_DM, the most highly abundant TE in the Drosophila melanogaster genome (Kapitonov and Jurka, 2003). Thus, it may be possible that recombination rates are higher in G. cineraria due to the abundance of helitrons. The associated genomic rearrangements of helitrons transpositional activities can also explain genomic complexities and instability of microsatellite flanking regions in this species.

Unequal crossing over the other mechanism can result when ‘a chiasma occurs at two imperfectly aligned microsatellites with shared repeat units, leaving two new upstream–downstream combinations’ (Van’t Hof et al., 2007). The asymmetrical arrangement of similar microsatellite flanking regions from a DNA family is usually indicative of it having undergone unequal crossing over (Meglécz et al., 2004). Evidence for this type of arrangement was identified from all three species, thus it may not have been an important factor in differentiating the microsatellite flanking regions among species.

TE abundance and microsatellite proliferation

The genesis, behaviour and evolution of microsatellites within a genome are still a subject for ongoing discussions. One suggestion is that TEs that carry microsatellite repeated regions may have behaved as microsatellite-inducing elements in the host genomes of the plants pea and barley and some insects (Ramsay et al., 1999; Wilder and Hollocher, 2001; Coates et al., 2009; Smýkal et al., 2009). Likewise, Gaffney et al. (2003) proposed that CvA may be an ancient TE that has behaved as a source of satellite DNA in distantly related bivalves. This process would have involved the conversion of a proto-microsatellite (or imperfect microsatellite) contained within a TE, into heterochromatic satellite DNA (Wilder and Hollocher, 2001). Herein, we provide supporting evidence for the hypothesis that the non-autonomous TE, CvA, has behaved as a microsatellite dispersal agent in gastropod mollusc genomes. The phylogenetic conservation of CvA among the family Trochidae, in addition to the distantly related Bivalvia (Gaffney et al., 2003), confirms that CvA is an ancient molluscan TE. Furthermore, the establishment of the close association between this MITE and DNA families in G. cineraria observed in this study supports the idea that CvA is a dispersal agent for microsatellites, through hitchhiking within TEs during transposition (Coates et al., 2009; Smýkal et al., 2009). These processes can also explain the formation and account for the abundance of multilocus DNA families in G. cineraria. Finally, Wilder and Hollocher (2001) determined that the conversion of a proto-microsatellite (or imperfect microsatellite) contained within a TE into heterochromatic satellite DNA often leads to the existence of exclusively two tetra-nucleotide microsatellite repeat arrays. In our study, we likewise identified an abundance of imperfect tetra-nucleotide repeat motifs associated with the DNA families.

Conclusion

The study of MCS and neighbouring genomic regions is important to increase our knowledge about microsatellite genesis and evolution. Furthermore, it could also provide extra molecular tools available to resolve phylogenetic relationships and assist in the identification of population genetic structure. In this study, the comparative genomic analysis revealed considerable inter-specific variation in genomic complexity and stability of microsatellite flanking regions. We provide novel evidence regarding the differential importance of autonomous vs non-autonomous TEs in DNA multiplication and recombination rearrangement processes that explain genomic complexities. The discovery of many novel gastropod cryptic repetitive DNA associated with DNA families should provide a basis for further research into the description of new TEs. Molluscs, therefore, may prove useful as model genomes to investigate TE involvement in the behaviour and evolution of microsatellites and other genomic elements. Further research that involves whole-genome sequence data sets is required, however, to extend our understanding of the trends and observations outlined in this study to other genomic regions.