Telomeric DNA sequences in beetle taxa vary with species richness

Telomeres are protective structures at the ends of eukaryotic chromosomes, and disruption of their nucleoprotein composition usually results in genome instability and cell death. Telomeric DNA sequences have generally been found to be exceptionally conserved in evolution, and the most common pattern of telomeric sequences across eukaryotes is (TxAyGz)n maintained by telomerase. However, telomerase-added DNA repeats in some insect taxa frequently vary, show unusual features, and can even be absent. It has been speculated about factors that might allow frequent changes in telomere composition in Insecta. Coleoptera (beetles) is the largest of all insect orders and based on previously available data, it seemed that the telomeric sequence of beetles varies to a great extent. We performed an extensive mapping of the (TTAGG)n sequence, the ancestral telomeric sequence in Insects, across the main branches of Coleoptera. Our study indicates that the (TTAGG)n sequence has been repeatedly or completely lost in more than half of the tested beetle superfamilies. Although the exact telomeric motif in most of the (TTAGG)n-negative beetles is unknown, we found that the (TTAGG)n sequence has been replaced by two alternative telomeric motifs, the (TCAGG)n and (TTAGGG)n, in at least three superfamilies of Coleoptera. The diversity of the telomeric motifs was positively related to the species richness of taxa, regardless of the age of the taxa. The presence/absence of the (TTAGG)n sequence highly varied within the Curculionoidea, Chrysomeloidea, and Staphylinoidea, which are the three most diverse superfamilies within Metazoa. Our data supports the hypothesis that telomere dysfunctions can initiate rapid genomic changes that lead to reproductive isolation and speciation.


Results
Distribution of telomeric (TTAGG) and (TCAGG) sequences. The first part of our study was an extensive survey of the distribution of (TTAGG) n and (TCAGG) n telomeric sequences across Coleoptera, for which we evaluated 44 families in 15 superfamilies (Table S1), covering the suborders Adephaga and Polyphaga. The mapping was performed using dot-blot hybridization (Fig. 1), Southern hybridization (Fig. 2), and by searching the National Center for Biotechnology Information (NCBI) databases for tandem repeats (Table S3). The obtained data (Table 1) were interpreted based on the recently determined phylogenetic relationships of Coleoptera (Fig. 3) and the species richness and age of taxa (Figs. 3,4).
In Adephaga, we targeted representatives of Carabidae (Geadephaga), which is the family covering the majority of adephagan species 21,[24][25][26] . Using dot-blot hybridization, we tested representatives of four subfamilies (Carabinae, Pterostichinae, Licininae, and Platyninae), and the presence of (TTAGG) n was detected only in one representative of two tested species of Pterostichinae. A further search of Carabidae was conducted using the NCBI database. In Trechinae, the data showed (TTAGG) n in 17 species and the absence of this sequence in six species, and (TTAGG) n was absent in Cicindelinae and Harpalinae species. The (TCAGG) n sequence was not found in any of the Carabidae species examined. Next, we detected (TTAGG) n in Rhantus sp. (Colymbetinae, Dytiscidae) using dot blot hybridization and in Hydroporinae (Dytiscidae, Hydradephaga) in the NCBI database.
We tested one species of Scirtoidea (Scirtes sp., Scirtidae), which is a small superfamily of beetles with a basal position within Polyphaga, sharing archaic morphological features with Archostemata and Adephaga 21,27,28 . Scirtoidea species displayed no hybridization signal for either of the tested sequences.
All tested Elateriformia species representing Throscidae, Eucnemidae, Cantharidae, Elateridae, and Lampyridae showed the presence of (TTAGG) n and absence of (TCAGG) n motif by dot-blot hybridization or the NCBI search.
Next, we investigated Staphylinoidea, which includes the majority of the Staphylinoformia species, and we tested subfamilies of Staphylinidae and Leiodidae, which are the two largest Staphylinoidea families 29 . DNA hybridization experiments revealed the absence of the (TTAGG) n motif in Leiodinae (Leiodidae), and the NCBI search showed its absence in Oxytelinae and Tachyporinae (both Staphylinidae). In contrast, based on the NCBI search, 19 species of Aleocharinae were found to be (TTAGG)-positive. The (TCAGG) n sequence was not found in any of the examined species. Together with the published data 20 , we can conclude that Staphylinoidea varies in the presence/absence of the (TTAGG) n motif.
Scarabaeoidea is the only superfamily of Scarabaeiformia, herein represented by Geotrupidae, Lucanidae, and Scarabaeidae. None of the tested representatives displayed the presence of either (TTAGG) n or (TCAGG) n .
In both Curculionoidea and Chrysomeloidea, which are considered sister groups 32 , the occurrence of the (TTAGG) n sequence highly varied. In Curculionoidea, while the sequence was detected in Anthribidae, no signal was observed in Brentidae and Attelabidae, and variability in sequence occurrence was observed at family and subfamily levels within Curculionidae. Consistent with previous findings, Curculionidae displayed the presence of the (TTAGG) n sequence in Ips typograpus (Scolytinae; consistent with 33 ) and Phylobius urticae (Entiminae, Phylobiini; consistent with 17 ). However, based on hybridization data and the NCBI database Scolytinae, Entiminae and Curculioninae showed variability in the (TTAGG) n occurrence, and the sequence was not detected in Dryophthorinae and Cossoninae. In Chrysomeloidea, instability in the (TTAGG) n presence was observed at family and subfamily levels within Chrysomelidae and Cerambycidae. Based on our data, the sequence was absent in four of the 22 tested species of Chrysomelidae within Criocerinae (in one of two tested species), Galerucinae (in two of four tested species), in one tested species of Clytrinae, and three of six Cerambycidae representatives within Lepturinae (in one of three tested species) and two tested species of Lamiinae. Interestingly, the (TCAGG) n motif was detected in two representatives of Phytophaga: Byctiscus populi (Curculionoidea, Attelabidae) and Leiopus nebulosus (Chrysomeloidea, Cerambycidae).
In the superfamily Bostrichoidea, which is a sister group of Cucujiformia 31 , all tested families (Bostrichidae, Anobiidae, Dermestidae, Nosodendridae, and Ptinidae) showed the presence of the (TTAGG) n sequence and the absence of the (TCAGG) n sequence. www.nature.com/scientificreports/ A more detailed characterization of the hybridization signals was conducted using Southern hybridization in selected representatives of the tested taxa. The signals were formed by long smears, mostly ranging from 2 kb to more than 21 kb, and mostly revealed numbers of hybridization bands of different molecular weights (Fig. 2a,b).     www.nature.com/scientificreports/ Collectively, our study indicates that the (TTAGG) n sequence has been repeatedly or completely lost in Geadephaga and more than half of the tested polyphagan superfamilies (Fig. 3a,b), and the sequence was replaced with (TCAGG) n motif in sister groups Tenebrionoidea and Cleroidea.

Search for telomeric sequence variants: the (TTGGG) n sequence as a novel telomeric sequence in beetles.
A search of the NCBI databases for tandem repeats revealed that the (TTGGG) n and (TGAGG) n sequences are candidates for novel telomeric sequences in Coleoptera (Table S3). Using dot blot hybridization, we examined the TTAGG-and TCAGG-negative species for the presence of these sequences. Besides, we examined these species for the presence of several other sequence variants that have been reported as telomeric motifs in different organisms, including (TTT GGG ) n 34 , (TTG GGG ) n 35 (from ciliate protozoans), (TTA GGG ) n (from vertebrates) 36 , and TTT AGG G (from plants) 37 . Except for (TTGGG) n , no positive signals were detected for any of the other tested sequences. Consistent with the NCBI database showing the presence of the (TTGGG) n in three scarabaeoid species (Onthophagus taurus, Pachysoma striatus, and Canthidium sp.), the sequence was detected in Anoplotrupes stercorosus, and the presence of the sequence on chromosome termini was confirmed using Bal31 digestion and FISH (Figs. 2c, 3d,e). Surprisingly, (TTGGG) n has not been confirmed in other tested scarabaeoid species (not shown).

Repeated losses of telomeric motifs in certain taxa reflect the species richness but not the age of the taxa.
The presence/absence of the (TTAGG) n sequence varied within the Curculionoidea, Chrysomeloidea, and Staphylinoidea, which are the three most diverse metazoan superfamilies. In these taxa, variance in the presence of (TTAGG) n was observed even at the family or subfamily level, that is, within Staphylinidae (> 63,000 species 38 ), Chrysomelidae (> 32,000 species 21 ), Cerambycidae (> 32,000 species), and Curculionidae (> 51,000 species 21 ) (Fig. 3). Similarly, the occurrence of (TTAGG) n varied in the highly diverse Carabidae (> 34,000 species 21 ). In contrast, no variance was observed in, for instance, the less diverse, but highly sampled, Tenebrionoidea, Cleroidea, or Elateroidea.
To test the hypothesis that more speciose taxa have greater telomere diversity, we used the GLS method. We found that there was a significant effect of motif diversity on the species richness per taxon (GLS, F 1,12 = 11.2, P = 0.006, Fig. 1): the diversity increased exponentially with richness (Fig. 4). We also tested the effect of age of the taxon; the effect of age was not significant (GLS, F 1,11 = 1.3, P = 0.28).

Discussion
There are two main hypotheses to explain the number of species in a clade [39][40][41][42] . The clade-age hypothesis proposes that species richness increases with the age of the clade. According to the diversification-rate hypothesis, the number of species depends on net diversification, which reflects speciation minus extinction over time; thus, old clades with low richness have low net diversification rates, while young clades with a high number of species have high rates of diversification. In our study, we observed a positive relationship between species number and diversity in the telomere sequence in clades, regardless of the age of the clades. Therefore based on our data, we hypothesize that the key reason for the frequently observed telomeric losses in certain coleopteran clades is the   The positive correlation between species richness and diversity in telomeric sequences seems to be consistent with the highly frequent (TTAGG) n losses within the Insecta, with 1,020,007 species, representing about 66% of all animals, and the variance in telomeric sequences is observed in one of the most diverse insect orders, which are, along with Coleoptera, Hemiptera (104,000 species), Hymenoptera (117,000 species), and Diptera (156,000 species) 16,51 . In contrast, 57 tested representatives of Orthoptera (27,000 species 39 ) had the (TTAGG) n sequence 16 . Certain similarities can be observed in other highly diverse classes of organisms, such as Magnoliophyta (flowering plants, 300,000 species 52 ). Loss of the plant ancestral telomere DNA sequence, (TTT AGG G) n , has been reported in numerous flowering plants, in which the sequences were replaced by alternative motifs 10,53-57 , and high diversity in telomere repeats was observed in two species-rich orders, Asparagales, the largest order within the monocotyledons, consisting of around 30,000 species [58][59][60][61] , and Lamiales, consisting of 23,000 species 55 . Besides, numerous plant orders possess unknown telomere sequences 10 . It is also debatable whether the positive correlation between species richness and telomere sequence diversity is due to the direct involvement of telomere biology in the process of speciation, or if it is simply because the diversity in species-rich clades can be detected more easily.
Telomeric DNA sequences interfere in the formation of a telomere capping structure 62 , and therefore, we can assume that a small deviation in the telomere sequence can affect whole-genome stability and have an enormous evolutionary impact. It is well known that telomere dysfunction results in chromosomal fusions, large-scale genomic rearrangements, and instability across the genome, which are all hallmarks of speciation. Chromosomal rearrangements are associated with high mortality and, due to meiotic anomalies, reduced fertility, which collectively results in a generation of isolated groups within the population; each of the groups could develop into a new species 63 . Therefore, we can suppose that telomeres might provide a powerful mechanism for rapid genomic changes that can lead to reproductive isolation and speciation, and our data supports this premise.
We propose that chromosomal rearrangements triggered by telomere dysfunction might result in species extinction or a formation of new species, which includes either stabilization of the existing telomeric system or development of a new one. Our study showed that numerous Coleoptera species lack the ancestral insect telomere sequence, but the exact telomeric motif in most of these species remains unknown. We can only speculate whether the sequence was replaced by another short telomeric sequence or a completely different system independent of telomerase. To adopt new features, telomerase system can be very plastic as documented by the remarkable divergence in telomere sequences in budding yeast, which shows extraordinary lengths, occasional degeneration, and a frequent absence of G/C-richness 64 , or it can be documented by numerous species with backup pathways for telomere lengthening when telomerase activity is compromised [65][66][67][68] .
Various studies identified transposable elements as modifiers of adaptive response upon exposure to a stressful environment. It has been shown that transposable elements can be activated by diverse stressors such as DNA damaging agents, thermal stress, and also telomere dysfunction [69][70][71][72] . Transposable elements are known to induce genomic rearrangements 73 , and we assume that the activation of transposable elements by telomere dysfunction not only contributes to speciation, but at the same time, it allows the development of the retroelement-based telomeric system during the speciation process. Although telomeric retroelements are a hallmark of Drosophila telomeres, they are found also in distantly related species incorporated in the telomerase-added sequences at their chromosome termini. It needs to be as well pointed out that telomeric retroelements are not universal systems of telomere elongation in the genus Drosophila and not at all in Diptera, as telomere maintenance in some species uses telomere-telomere recombination [74][75][76][77][78][79] . Nevertheless, we can suppose that the retroelement and recombination systems work as the backup pathways for telomere lengthening when telomerase activity is compromised [65][66][67][68] . www.nature.com/scientificreports/ Together, these findings reveal the plasticity of chromosome ends for incorporating new features to maintain telomere integrity and functionality, perhaps pointing to the mechanisms by which telomeres contribute to the speciation and adaptation process. We believe that telomere diversity in insects provides the right opportunity to research such underexplored aspects of telomere biology.
Hybridization probes. Hybridization probes were prepared using non-template PCR. The list of primers is provided in Table S2. The non-template PCR reaction contained 10 μM forward primer, 10 μM reverse primer, 10 mM dNTP mix, and Taq polymerase (5 U/μl). The PCR products were labeled by random primed labeling with digoxigenin-11-dUTP using a DIG DNA Labeling Kit (Roche diagnostics) and biotin-14-dATP using a Biotin-Nick Translation Mix (Roche diagnostics).
Bal 31 nuclease assay. To confirm the terminal positions of tested sequences on chromosome ends, the genomic DNA was subjected to BAL 31 exonuclease. Bal 31 nuclease degrades 3′ and 5′ termini of duplex DNA without generating internal scissions in the intact double helix. The assay was performed as described previously 81 . Briefly, the genomic DNA (15 μg) was incubated with 0.03U BAL 31 nuclease (New England Bio-Labs) in a total volume of 180 μl at 30 °C, and 60-μl aliquots were taken before the BAL31 addition and after 30, 60, 90, and 120 min of the treatment. The reaction was immediately stopped by incubation at 65 °C for 10 min in the presence of 20 mM EGTA. Samples were purified using phenol-chloroform-isoamyl alcohol extraction, and DNA was extracted using the standard ethanol precipitation by adding 1/10th volume of sodium acetate and 2 volumes of ethanol. Then, the DNA samples (1 µg) were digested with the restriction enzymes RsaI and Hinf I (New Englands Biolabs) and subjected to Southern hybridization.
Chromosomes preparations and FISH. Chromosome preparations were prepared from gonads of tested adults. The gonads were dissected in the Ringer's solution, incubated in a hypotonic solution (0.075 M KCl) for 10-20 min, and then fixed in freshly prepared Carnoy solution (ethanol:chloroform:acetic acid, 6:3:1). Using tungsten needles, tissue was ripped up in a drop of 60% acetic acid and spread on the microscope slide placed on a heating plate (45 °C). Preparations were dehydrated in an ethanol series (70%, 80%, 90%, 30 s each) and stored at − 20 °C before their use.
Search NCBI databases. Short read archive at National Center for Biotechnology Information (SRA, NCBI) was searched for datasets from Coleoptera with restrictions to the WGS strategy, genomic source, and random selection. If available, only the first ten thousand spots were downloaded at maximum and analyzed using Tandem Repeats Finder with the options set as described previously for Bal31-NGS 82 . The presence/ absence of telomeric motifs was checked manually in the Tandem Repeats Finder output.
Statistical analysis. The phylogenetic generalized least squares (GLS) from the ape package 83 was used to test the hypothesis that more speciose taxa have greater telomere diversity. GLS was used because the measurements were not independent (as they may have a common evolutionary history) 84 and the relationship was not linear (to be tested by correlation). We constructed a truncated phylogenetic tree using the most recent phylogenetic hypothesis of Coleoptera. As branch distances were not reported, these were computed using Grafen's method 85 . Shannon (entropy) index was used to estimate the motif diversity and this response variable was then logarithmically transformed to fit an exponential relationship. Species richness was estimated at the superfamily level. The linear predictor included the number of investigated species (per taxon) to correct for different intensi-