Introduction

B chromosomes are dispensable genomic elements reported in many plant, animal, and fungal species (Jones and Rees 1982; Camacho 2005; Houben et al. 2014; Jones 2017). B chromosomes were discovered more than a century ago (Wilson 1907) and, for many years, only repetitive DNA had been found on them (for review, see Camacho 2005). However, it is now known that B chromosomes also contain protein-coding genes (Martis et al. 2012; Valente et al. 2014; Navarro-Dominguez et al. 2017). A common characteristic of most B chromosomes is the accumulation of repetitive DNA, which accounts for its evolution and differentiation from A chromosomes (Camacho 2005; Houben et al. 2014). These accumulated repeats include microsatellites and satellite DNAs (satDNAs), multiple classes of transposable elements (TEs), and multigene families (see for example Nur et al. 1988; Ziegler et al. 2003; Coleman et al. 2009; Poletto et al. 2010; Peng and Cheng 2011; Bueno et al. 2013; Klemme et al. 2013; Milani and Cabral-de-Mello 2014; Silva et al. 2014; Coan and Martins 2018; Hanlon et al. 2018; Malimpensa et al. 2018; Marques et al. 2018; Ruiz-Ruano et al. 2018; Felicetti et al. 2021; Stornioli et al. 2021).

In grasshoppers, cytological and molecular analysis in multiple species revealed that most B chromosomes show C-banded heterochromatin and plenty of DNA repeats (for instance, see Ruiz-Ruano et al. 2016a, 2018; Milani et al. 2017a, 2018). For example, most B chromosome variants found in Eyprepocnemis plorans are mostly made of rDNA and a satDNA family (Cabrero et al. 1999; 2014; López-León et al. 2008) and they are enriched in R2 retrotransposons (Montiel et al. 2014). The most complete quantification of the repeatome in a grasshopper was recently performed in Locusta migratoria and showed that the B chromosome contains 94.9% of repetitive DNA, with a single satDNA comprising 55% of the B chromosome (Ruiz-Ruano et al. 2018). In addition, this B chromosome showed a 17 kb region, including 29 different TEs, which was apparent as a FISH band on the B chromosome. Similarly, heterochromatic B chromosomes rich in a variety of repetitive DNAs have been reported in other grasshopper species, such as Eumigus monticola, Rhammatocerus brasiliensis, Xyleus (discoideus) angulatus, Schistocerca rubiginosa, Podisma sapporensis and Dichroplus pratensis (Bidau et al. 2004; Loreto et al. 2008; Oliveira et al. 2011; Ruiz-Ruano et al. 2016a; Jetybayev et al. 2018; Milani et al. 2018).

An exception to this general pattern is the South American grasshopper Abracris flavolineata (2n = 22 + X0♂/XX♀) where a submetacentric B chromosome failed to show heterochromatin defined by the C-banding technique, i.e., C-positive blocks (Cella and Ferreira 1991; Bueno et al. 2013). This B chromosome is mitotically stable thus showing the same number in all cells from the same individual and occurring in one or two copies in a natural population sampled at Rio Claro, São Paulo, Brazil (Milani et al. 2017b). Current evidence supports the origin of this B chromosome from the longest chromosome pair (L1), based on the U2 snDNA being only visualized by FISH on these two chromosomes (Bueno et al. 2013). In addition, we detected other repeats on this B chromosome, a satDNA family (Milani et al. 2017a), some microsatellite repeats (Milani and Cabral-de-Mello 2014), and two TEs (Palacios-Gimenez et al. 2014), which were shared with many A chromosomes. Intrigued by the absence of C-positive heterochromatin on the B chromosome of A. flavolineata, we decided to perform high-throughput complementary bioinformatic and cytogenetic analyses to characterize its repetitive DNA content. Repeatome analysis including 1744 TEs, 53 satDNAs, and 9 multigene families revealed that, consistent with its C-heterochromatin scarcity, this B chromosome is not enriched in high-copy repetitive DNAs, which makes it unusual among B chromosomes in general. Exceptionally, we found a few repetitive DNAs present on the euchromatic (C-negative) regions of the A chromosomes, which also decorate the interstitial regions of the B chromosome. In contrast, other repetitive DNAs that were enriched in heterochromatic regions (C-bands) of the A complement were mostly restricted to centromeric and distal regions of the B chromosome. In addition, satellitome analysis revealed that the B chromosome shared one satDNA family in exclusivity with the L1 autosome thus supporting B ancestry from this A chromosome. We finally suggest that this B chromosome could be a young element currently being in an initial step of heterochromatinization.

Materials and methods

Biological materials, genomic DNA extraction, and chromosome preparations

For molecular and bioinformatic analysis, we used the same seven male individuals of Abracris flavolineata (three 0B, two 1B, and two 2B) previously studied by Bueno et al. (2013), Milani et al. (2017b), and Ahmad et al. (2020). The hind legs of these animals, previously stored in 100% ethanol at −20 °C, were used for genomic DNA (gDNA) extraction following the phenol/chloroform-based protocol (Sambrook and Russell 2001), which was used for genomic sequencing and PCR assays (see next topics).

For chromosomal mapping we collected five gravid females, which were maintained alive in cages at the laboratory until oviposition, allowing embryos to be obtained. Mitotic embryo chromosome spreads were prepared according to the protocol proposed by Webb et al. (1978). Chromosome spreads were performed by maceration and spreading of portions of embryos on a slide within a drop of 50% acetic acid, under a hot plate at 45 °C.

Genome sequencing and identification of repetitive DNA sequences being overabundant in B-carrying individuals

Genomic DNA sequencing was performed by the Illumina HiSeq 4000 platform using the Macrogen Inc. service (Seoul, Republic of Korea). Sequencing yielded 27–41 Gb DNA (per sample) of 151 bp paired reads. The genomes from seven individuals are deposited in the Sequence Read Archive (SRA) under the accession numbers SRX7784770–SRX7784772. Repetitive sequences making up the repeatome of A. flavolineata were recovered and characterized using different approaches, including a thorough search for the satDNA families making up the satellitome, multigene families, and TEs (see details below).

To find and characterize the maximum number of different satDNA families, we applied the satMiner protocol (Ruiz-Ruano et al. 2016b). For this purpose, we randomly selected 2 × 5,000,000 reads from each individual using SeqTK (https://github.com/lh3/seqtk) and pooled those belonging to the same type of genome (0B, 1B, or 2B) by concatenating them. We then performed sequence preprocessing for each group of reads using the “rexp_prepare_normaltag.py” script (https://github.com/fjruizruano/ngs-protocols), which uses Trimmomatic (Bolger et al. 2014) to remove adapters and low-quality nucleotides (Q < 20), and finally selected only completely paired reads after trimming, i.e., those read pairs with 151 bp in both members. The script then interleaves forward and reverse reads and converts them to fasta format. We obtained 100,000 read pairs for each of the three libraries (0B, 1B, and 2B) and concatenated them into a single file. We then applied the satMiner protocol (Ruiz-Ruano et al. 2016b) consisting of several rounds of clustering with RepeatExplorer (RE) software (Novák et al. 2013) alternated with the DeconSeq filtering tool (Schmieder and Edwards 2011) to remove those satDNA sequences identified in previous RE rounds and added 100,000 of these cleaned read pairs from each pool sample (0B, 1B, and 2B) prior to each new RE round (again summing up 300,000 read pairs).

RE clusters putatively containing satDNAs were selected by visual graph inspection to identify those showing spherical or ring shapes, which are characteristic of this type of DNA sequence. Then, we performed manual curation of the selected contigs by Geneious v4.8 software (Drummond et al. 2009), checked their tandem structure by dotplot graphic inspection, and recovered the consensus sequence for repeat units of each satDNA family or subfamily. To search for homology between different satDNA families we first compared their consensus sequences using multiple sequence alignments with Muscle (Edgar 2004) implemented in Geneious v4.8 software (Drummond et al. 2009), and second, we ran a homology test based on RepeatMasker (Smit et al. 2017) with “rm_homology.py” (https://github.com/fjruizruano/ngs-protocols). The results of these analyses were used to classify the satDNA collection into superfamilies, families or subfamilies according to the identity criterion proposed in Ruiz-Ruano et al. (2016b).

For TE identification, we randomly selected 100,000 read pairs from each pool of genomes (0B, 1B, and 2B), for a total of 600,000 reads, which were used as input for a single RE round followed by a reclustering-specific tool available in the Galaxy platform (https://repeatexplorer-elixir.cerit-sc.cz/galaxy/). This tool was used for merging clusters showing homology into larger contigs, which are prone to improve TE assembly. Then, we analyzed all the cluster contigs for sequence extraction with Geneious v4.8 software (Drummond et al. 2009). Since this method allowed the recovery of fewer than 200 different TE families, we also used the dnaPipeTE pipeline (Goubert et al. 2015), which uses Trinity (Grabherr et al. 2011) as an assembler, followed by recurrent TE annotation and quantification in the raw reads compared with a custom database previously built by Ruiz-Ruano et al. (2018) from B-carrying genomes of Locusta migratoria. This analysis was performed using only forward reads and default parameters recommended for dnaPipeTE. Next, by means of a custom script (https://github.com/fjruizruano/ngs-protocols/blob/master/dnapipete_createdb.py) we used dnaPipeTE assembly and annotation to generate a fasta file with annotated contigs in the RepeatMasker format (Smit et al. 2017) for further analysis.

Finally, the multigene families (H3 histone gene, 18S, 28S, 5.8S, and 5S rDNAs, U1, U2, and U6 snDNAs) and full mitochondrial DNA (mtDNA) were recovered using MITObim (Hahn et al. 2013) with the seed sequences used for Locusta migratoria in Ruiz-Ruano et al. (2018).

All the repeats obtained by these different methods were later concatenated, and redundancy was removed by CD-HIT-EST clustering (Li and Godzik 2006) using an 80% sequence identity level, implying that those repeats showing at least 80% identity were considered the same family.

Estimation of repetitive DNA sequence abundances and divergences in the A. flavolineata genome

Sequence abundance and divergence of each repetitive DNA family were determined in each of the seven genomes analyzed by means of RepeatMasker (Smit et al. 2017) using the Cross_match search engine on 5,000,000 read pairs from each library. SatDNA families were named in decreasing order of abundance in 0B genomes, following Ruiz-Ruano et al. (2016b). Sequence divergence was estimated by the Kimura 2-parameter (K2P) model using the calcDivergenceFromAlign.pl script within RepeatMasker software (Smit et al. 2017). Abundance for a given repetitive DNA family was calculated as a genome proportion, represented by the sum of all mapped nucleotides belonging to it (including all subfamilies) with respect to the total number of nucleotides in the selected reads from each Illumina library. Abundance and divergence for each family were separately estimated for each individual and later averaged for 0B (three individuals), 1B (two individuals), and 2B (two individuals) genomes. We then calculated two sequence abundance quotients, 1B/0B and 2B/0B, to search for repeats being overabundant in the B-carrying genomes so that those repeats showing both quotients clearly higher than 1 and that 2B/0B was higher than 1B/0B were considered overabundant in B-carrying individuals and thus enriched in the B chromosome. However, those repeats showing quotients lower than 1 are considered less abundant (or absent) in the B chromosome rather than in the average A chromosome. All satDNA families and some TEs showing overabundance in B-carrying genomes were selected for subsequent chromosomal mapping (see below).

DNA amplification and chromosomal mapping of repetitive DNAs

We designed primers for PCR amplification either manually or else using Primer3 software (Untergasser et al. 2012) (Supplementary Table 1), and PCR conditions followed the same protocol described in Milani et al. (2018). For satDNA sequences, the monomeric bands were isolated and purified using the Zymoclean™ Gel DNA Recovery Kit (Zymo Research Corp., The Epigenetics Company, CA, USA) according to the manufacturer’s recommendations. The same method was applied for TE isolation, taking care of isolating fragments showing the size expected from computational annealing of primers. These products were used for reamplification using the same PCR conditions. All amplified sequences were sequenced by the Sanger method using Macrogen Inc. (Seoul, Republic of Korea) service to confirm the actual amplification of the target sequence.

We performed fluorescence in situ hybridization (FISH) on mitotic chromosome spreads from embryos using one or two probes simultaneously, according to Cabral-de-Mello and Marec (2021). Probes were labeled by digoxigenin-11-dUTP (Roche, Mannheim, Germany) or biotin-14-dATP (Invitrogen) and detected by antidigoxigenin-rhodamine (Roche) and streptavidin, Alexa Fluor 488-conjugated (Invitrogen), respectively. The chromosomes were counterstained using 4′,6-diamidine-20-phenylindole dihydrochloride (DAPI) and slides were mounted in VECTASHIELD (Vector, Burlingame, CA, USA). The preparations were observed and images were captured using a BX61 Olympus microscope equipped with a fluorescence lamp and appropriate filters and a DP70 cooled digital camera. All images were processed and optimized using Adobe Photoshop CS6. According to the results observed, we classified the satDNA families into three types: (i) visible FISH bands covering the whole chromosome width (B-pattern), (ii) occurrence of dot-like scattered signals across the chromosome (D-pattern), and (iii) no FISH signal at all (NS-pattern).

Statistical methods

We compared repeat abundance between the 0B, 1B, and 2B genomic libraries by means of nonparametric Friedman ANOVA and the Wilcoxon matched pairs test.

Results

Comparative genomic abundance reveals little enrichment for high-copy repeats in the B chromosome

The overall mean repetitive DNA abundance in A. flavolineata genomes from the Rio Claro, São Paulo, Brazil population was 52.94% in 0B individuals, 52.59% in 1B individuals, and 52.00% in 2B individuals; this figure thus decreased with an increasing number of B chromosomes (Friedman ANOVA: χ2 = 8.08, N = 1806, df= 2, P < 0.018). This result suggests that this B chromosome shows lower repetitive DNA content than the A chromosomes, on average, so that when a given repetitive element is scarce in the B chromosome, its genomic proportion will decrease as the number of Bs grows. This “dilution effect” was significant for TEs (χ2 = 10.12, N = 1744, df = 2, P < 0.0064), marginally significant for satDNA (χ2 = 4.57, N = 53, df = 2, P > 0.10), and not significant for multigene families (χ2 = 1.56, N = 9, df = 2, P > 0.45) (Fig. 1). However, a few repeats showed the reverse pattern, i.e., their abundance increased with increasing numbers of B chromosomes. This pattern suggested the presence of these repeats in the B chromosome. For quantitative application of this criterion, we calculated the 1B/0B and 2B/0B quotients and selected those elements showing 1B/0B > 1 and 2B/0B > 1B/0B, as the two conditions, as a whole, allowed selection for repeats showing increasing abundance with B number.

Fig. 1: Relative abundance of several types of repetitive DNA in A. flavolineata genomes carrying 1B and 2B, in comparison with the B-lacking genome, measured by the log2 transformed 1B/0B and 2B/0B quotients.
figure 1

a Five examples of TEs showing overabundance in the B chromosome (solid lines), and five others showing the dilution effect (dotted lines) with negative values for both quotients. b Overabundant TEs in the B chromosome, indicating TE type for the five showing the highest quotients. c Only two satDNA families were overabundant in the B chromosome. Note the dilution effect for many other satDNAs. d Only the U2 snDNA showed clear overabundance in the B-carrying genomes, whereas the other families showed the dilution effect or quotients close to zero, suggesting their scarcity in the B chromosome.

We found 53 satDNA families in A. flavolineata, all of which were present in the three genome libraries analyzed (Supplementary Table 2), thus revealing the absence of B-specific satDNAs. The dilution effect was also apparent for satDNA, as its genomic content decreased in the presence of B chromosomes (4.52% in 0B, 4.03% in 1B, and 3.99% in 2B) (see Supplementary Table 2 and Fig. 1c) (Wilcoxon matched pairs test: 0B vs. 1B: z = 3.27, P = 0.001; 0B vs. 2B: z = 2.19, P = 0.028). However, we found no significant difference in satDNA content between the 1B and 2B libraries (z = 0.27, P = 0.79), perhaps due to some degree of B chromosome heterogeneity. Consistent with the general dilution effect for satDNA, abundance comparisons between libraries revealed that only two satDNA families (AflSat52-23 and AflSat53-17) were overabundant in the B-carrying genomes (see Table S2 and Fig. 1c).

The analysis of coding tandem repeats (including rRNA, U snRNA, and H3 histone multigene families) revealed that only the U2 snRNA family showed overabundance in the B-carrying genomes (Fig. 1d). The presence of U2 snDNA on the A. flavolineata B chromosome was previously shown by FISH analyses (Bueno et al. 2013; Menezes-de-Carvalho et al. 2015; Milani et al. 2017b). The remaining gene families and mtDNA failed to show differences in relative abundance between B-carrying and B-lacking genomes, but some of them displayed the dilution effect (Fig. 1d).

In the case of TEs, we found 212 elements (out of the 1744 analyzed) meeting the 1B/0B > 1 and 2B/0B > 1B/0B criteria. These elements belonged to 28 families (Supplementary Table 3), the most abundant being LTR/Gypsy elements (Fig. 2). To test whether these results actually reflect overabundance in the B chromosome, we performed FISH for one element belonging to three distinct superfamilies, LTR/Gypsy (Gypsy_17), DNA/Tc1 (Tc1_74), and LINE/Jockey (Jockey_72). This analysis revealed their concentration on certain B chromosome regions with the appearance of chromosome bands (Fig. 2). As these three families were among the seven most abundant, additional FISH work would reveal whether the observed pattern critically depends on abundance, a highly feasible possibility (see also Supplementary Figure 1).

Fig. 2: Comparative genomic proportion between the 0B, 1B, and 2B genomes for the most abundant TEs in Abracris flavolineata.
figure 2

The asterisks indicate the superfamilies in which one representative was selected for FISH mapping on chromosomes (ac). Note the spread distribution on long arms and absence of signals on pericentromeric C-heterochromatic region of A chromosomes. In the B chromosome (arrowheads) observe the differential distribution of TEs, i.e., first interstitial half of long arm (a), spread signal along the entire extension of the B chromosome, except distal regions (b) and enrichment on interstitial areas of both arms, and faint signals in proximal half of long arm (c). This last repeat was also absent in the terminal regions. Bar = 10 μm.

High-throughput analysis of the satellitome reveals that satDNA is scarce on the B chromosome

One of the 53 satDNA families found (named here as AflSat02-391) had previously been described as AflaSAT-1 (Milani et al. 2017a). The repeat unit length (RUL) of the 53 families ranged from 7 to 832 bp (mean = 224, SD = 167.6), and the total A + T content ranged from 30.43% to 76.50% (mean = 57.1%, SD = 8%). Homology tests between all satDNA families revealed the occurrence of only two superfamilies (SFs), with AflSat15-299, AflSat16-298, and AflSat26-296 comprising SF1, and AflSat20-233 and AflSat28-247 constituting SF2. As expected, the families belonging to each SF showed highly similar sequence properties (RUL and A + T content) (Supplementary Table 2).

A subtractive landscape (2B/0B) revealed a clear dilution effect for satDNA abundance, as the 2B genome showed a high deficit for most satDNA families, especially for the most abundant ones (Fig. 3). To analyze whether these genomic results are reflected at the cytogenetic level, we performed the physical mapping by FISH on A and B chromosomes of A. flavolineata for all 53 satDNA families identified by bioinformatic analysis. Similar to other grasshopper species (for instance, see Ruiz-Ruano et al. 2016a, 2018), we observed three different patterns, with 44 families showing bands on chromosomes (B-pattern), three families showing many small dots scattered on chromosomes (D-pattern), and the six remainder showing no FISH signals (NS-pattern) (Table 1 and Supplementary Figure 2).

Fig. 3: Subtractive repetitive landscape (genome proportion versus sequence divergence based on Kimura substitution level) obtained from average counts for satDNAs in two males with 2B chromosomes and three with no B chromosome of Abracris flavolineata.
figure 3

Abundance values show the difference between the 2B minus the 0B genomes. Thus, positive values indicate overabundance in the 2B genomes, and negative values indicate overabundance in the 0B genomes. Note the occurrence of mainly negative values indicating the low enrichment of satDNAs in 2B-carrying genomes.

Table 1 Chromosome location of the 53 satDNA families found in Abracris flavolineata.

A summary of chromosome locations for the 53 satDNA families (Table 1) indicated that 47% of the 205 FISH bands found on A chromosomes were located on pericentromeric regions involving the centromere and the short chromosomal arm. The location of these satDNAs thus coincided with the heterochromatin location in this species, as revealed by C-banding (Bueno et al. 2013). However, the other half of the satDNA bands were found on euchromatic regions at proximal (5%), interstitial (30%), or distal (18%) locations of the long A chromosome arms (Table 1 and Supplementary Figure 2a-x). Notwithstanding, it is clear that the pericentric heterochromatic regions were enriched in satDNA as they contained the five most abundant families representing 81% of all satDNA content in the 0B genome (Supplementary Table 2) (i.e., 3.67% out of the total 4.52%) (Fig. 4, Supplementary Figure 2a,b, Table 1). Remarkably, of these five satDNAs, only the satDNA showing the highest abundance (AflSat01-179) was present on all A chromosomes (Fig. 4a, Table 1), thus most likely playing a centromeric function (Melters et al. 2013). However, the least abundant satDNA families tended to show FISH bands on a single chromosome pair (Fig. 4b, Supplementary Figure 2), as 15 of the 20 families with this condition showed abundance under the median value of all 53 families, and only 5 showed abundance above the median (Table 1). X was the A chromosome showing more satDNA FISH bands in exclusivity (three interstitially and three distally located), followed by S10 (4), M6 (3), L2 and M8 (2), and L3 and M7 (1). The X chromosome harbored the highest number of satDNA families (25) and it was the A chromosome showing the highest number of interstitial and distal satDNA bands (Table 1).

Fig. 4: Comparative genomic proportion and FISH mapping for eight satDNAs occurring in the B chromosomes with a banded pattern.
figure 4

SatDNAs showing high (a) or low (b) abundance (expressed as genome proportion). Repeat names are indicated on the left. Some A chromosomes are indicated on each embryo mitotic metaphase plate, and the B chromosome is indicated by arrowheads. The differential satDNA distribution on the B chromosome was observed, with pericentromeric signals for AflSat01 and AflSat02, pericentromeric plus distal signals for AflSat03, AflSat07, AflSat25, AflSat46, and AflSat52, and pericentromeric plus interstitial (on the long arm) signals for AfSat40. In addition, note the exclusive presence of AflSat46 bands on the B chromosome and the L1 pair. Bar = 10 μm.

We noticed a clear-cut difference in chromosome location between the two superfamilies existing in the genome of A. flavolineata, as the three families belonging to SF1 always showed proximal locations on one (AflSat16-298 and AflSat26-296) or two (AflSat15-299) A chromosome pairs (Table 1, Supplementary Figure 2g,i,n), whereas the two SF2 family members showed either proximal (AflSat20-233) or interstitial (AflSat28-247) locations (Table 1, Supplementary Figure 2m,w).

Finally, there were nine other satDNA families where the location on A chromosomes was not in the form of FISH bands, three of which showed the dotted pattern (D) (Table 1, Supplementary Figure 3), and the six remaining showed no FISH signal at all (NS) (Table 1, Supplementary Figures 2y, 4).

Regarding the B chromosome, we observed that eight of the 44 satDNA families showing the B-pattern on A chromosomes (AflSat01-179, AflSat02-391, AflSat03-17, AflSat07-36, AflSat025-40, AflSat40-218, AflSat46-153, and AflSat52-23), were also present on the B chromosome, whereas the three families showing the D-pattern also showed multiple small dots on the B chromosome (Table 1 and Fig. 4, Supplementary Figure 3). Among the 13 satDNA bands observed on the B chromosome, eight were pericentromeric, one was interstitial and four were distal. Most of the eight satDNA families showing FISH bands on the B chromosome showed multichromosomal locations on A chromosomes, except two showing locations on only one (AflSat46-153 on L1) or two (AflSat40-218 on S10 and X) A chromosomes (Table 1 and Fig. 4b). Among all A chromosomes, L1 and X were the A chromosomes sharing the highest number of satDNA families with the B chromosome (seven each; see Table 1). Bearing in mind that L1 also shares the U2 snDNA in exclusivity with the B chromosome (Milani et al. 2017b), we consider that, with the available data (repetitive DNA only), L1 is the best candidate to be the ancestor of this B chromosome.

The satDNA families with dotted patterns occupied virtually the entire extension of the B chromosome, but AflSat08-184 and AflSat42-75 were less abundant on pericentromeric and terminal regions (Supplementary Figure 3a,c) whereas AflSat13-177 was less evident on the proximal region of the short arm (Supplementary Figure 3b). They also showed FISH signals on the euchromatic (non-C-banded) regions of the long arm of all A chromosomes, but they were absent in their C-banded regions located on the pericentromeric region and the short arm (Supplementary Figure 3).

A comparative analysis of abundance for the eight satDNA families displaying the B FISH pattern on the B chromosome revealed why the global abundance of satDNA in the 0B, 1B, and 2B genomes showed a dilution effect. For this purpose, we separately represented the most and the least abundant families (Fig. 4), thus revealing that three families (AflSat01-179, AflSat07-36, and AflSat25-40) showed a clear decrease in abundance with an increasing number of B chromosomes, whereas only two (AflSat40-218 and AflSat52-23) showed the reverse pattern, due to B-enrichment, but these two satDNA families were among the least abundant in the genome.

Discussion

Genome low-pass sequencing combined with computational and chromosomal analysis provides a comprehensive understanding of the organization and evolution of DNA repeats on B chromosomes (Kumke et al. 2016; Ruiz-Ruano et al. 2018; Milani et al. 2018; Ebrahimzadegan et al. 2019; Serrano-Freitas et al. 2019). Through this approach, we found that the B chromosome of the grasshopper A. flavolineata is poorly enriched in repetitive DNA. Only three of the 53 satDNA families found in this species (AflSat40-218, AflSat52-23, and AflSat53-17), which are among the less abundant in the 0B genome, were overabundant in B-carrying genomes. Likewise, only 28 TE families, containing 212 elements, representing only 12% of the 1744 TEs found, showed overabundance in B-carrying genomes. This scenario contrasts with the general idea that B chromosomes are enriched in repetitive DNA (Camacho 2005; Houben et al. 2014; Marques et al. 2018). Consistently, repeat-enriched B chromosomes have been reported in fish (Ziegler et al. 2003; Coan and Martins 2018; Stornioli et al. 2021), reptiles (Kichigin et al. 2019), plants (Martis et al. 2012; Kumke et al. 2016; Ebrahimzadegan et al. 2019), and insects (Hanlon et al. 2018; Ruiz-Ruano et al. 2018). Among the repeats found in B chromosomes, satDNA is the most frequent component (McAllister 1995; Klemme et al. 2013; Hanlon et al. 2018; Ruiz-Ruano et al. 2018; Ebrahimzadegan et al. 2019; Langdon et al. 2000; Stornioli et al. 2021).

This accumulation of repetitive DNAs on B chromosomes is commonly assumed to be due to their genetic isolation from A chromosomes, with which they do not recombine (Camacho 2005; Houben et al. 2014). In this way, the nonenrichment in repetitive DNA, the absence of C-heterochromatin blocks, and the absence of B-specific satDNA families would be consistent with the hypothesis that this B is a young element, resembling the composition of the A chromosome from which it derived (most likely L1, see below). The high similarity between B and A chromosomes is also supported also by B chromosome microdissection of A. flavolineata followed by chromosome painting, as all C-negative A chromosome regions and the B chromosome were similarly labeled (Menezes-de-Carvalho et al. 2015).

Based on FISH mapping of the U2 snDNA, the B chromosome in A. flavolineata was suggested to have derived from the L1 autosome, as only these two chromosomes harbor this sequence (Bueno et al. 2013). Here, chromosomal mapping of the full satellitome of this species has provided additional clues about B chromosome ancestry and evolution. We observed that the L1 autosome and the X chromosome both share the highest number of satDNA families with the B chromosome, i.e., seven families. However, the absence of U2 on the X chromosome and the fact that the L1 autosome is the only A chromosome sharing AflaSat46-153 with the B chromosome, reinforce the conclusion that L1 is the most likely B ancestor. Although our data support the possible derivation of the B chromosome from the L1 autosome, with possible subsequent restructuring of the B chromosome, additional research is necessary to obtain accurate information on the possible synteny of the repeats shared by these chromosomes, as it would help to unveil the precise origin of the B chromosome. In addition, some repeats present on the L1 autosome were not found on the B chromosome, indicating some additional degree of B differentiation attributed to the intense dynamism of repetitive DNAs. Among grasshoppers, the origin of B chromosomes from large A chromosomes, as the current results suggest in A. flavolineata appears to be uncommon, as the few cases where B chromosome ancestry was claimed involved medium (M) or small (S) A chromosomes, such as S11 in E. plorans (Teruel et al. 2014), S8 in E. monticola (Ruiz-Ruano et al. 2016a), M8 and S9 in L. migratoria (Ruiz-Ruano et al. 2018), S9 in S. rubiginosa, S11 in R. brasiliensis, and S10 in X. d. angulatus (Milani et al. 2018). These medium- or small-sized A chromosomes are enriched in repetitive DNAs because their pericentromeric C-banded regions are the same size as the pericentromeric C-banded regions in long A chromosomes, but their non-C-banded regions are much smaller. Therefore, M and S chromosomes are more prone to be involved in chromosome rearrangements, which might be an initial step for B chromosome origin (Hewitt 1974; Perfectti and Werren 2001; Camacho 2005; Raskina et al. 2008; Houben et al. 2014; Ruiz-Ruano et al. 2016a; Milani et al. 2018). In A. flavolineata, the low amount of repeats on the B chromosome would be consistent with the low proportion of the C-banded region in L1 (see Fig. 5) and the loss of most of the C-banded chromatin in the B. In contrast, B derivation from medium or small A chromosomes with a lower proportion of non-C-banded chromatin should most likely render heterochromatic Bs, likewise in cases with B chromosome ancestry related to highly heterochromatic chromosomes, such as sex chromosomes (Sharbel et al. 1998; Pansonato-Alves et al. 2014; Ventura et al. 2015; Serrano-Freitas et al. 2019).

Fig. 5: Comparative C-banding and FISH for repeat location between the L1 and B chromosomes of A. flavolineata.
figure 5

The L1 autosome showed, like the remaining A chromosomes, a large pericentromeric C-band including the pericentromeric region and the short arm. The FISH analysis for seven satDNA families and the U2 snDNA repeat showed pericentromeric and telomeric locations on the B chromosome whereas they were located on the pericentromeric region and the short arm of L1 (AflSat01, AflSat02, AflSat03, the pericentromeric region (AflSat46), interstitial (AflSat07), interstitial region and the short arm (AflSat25 and AflSat52). satDNAs thus might suggest that B originates from the proximal third of L1, including the interstitial region containing several satDNAs. However, the U2 snDNA is located on an L1 region outside the former proximal region, so the presence of U2 on B is not explained by a single rearrangement event.

Remarkably, satDNAs displaying FISH bands on the A. flavolineata B chromosome frequently showed a symmetrical pattern for the FISH bands located on pericentromeric and distal regions, such as the U2 snDNA in both B chromosome arms (Fig. 5, Table 1), suggesting the isochromosome nature of this B chromosome and the involvement of centromeric misdivision in its origin. The small size of the FISH bands observed on the B chromosome for most satDNA families (e.g., AflSat01-179, AflSat02-391, and AflSat03-17), would be consistent with the loss of the L1 short arm (which contains the largest amount of C-heterochromatin and satDNA families) during the B-forming misdivision. Isochromosomes arising from misdivision have been reported in grasshoppers such as Eyprepocnemis plorans (López-León et al. 1993), Omocestus burri (Del Cerro et al. 1994) and Metaleptea brevicornis adspersa (Grieco and Bidau 2000), plants such as Zea mays (Carlson and Phillips 1986), Crepis capillaris (Leach et al. 2005) and S. cereale (Marques et al. 2012), the fish Astyanax scabripinnis (Mestriner et al. 2000) and Drosophila melanogaster (Hanlon et al. 2018). Against the isochromosome hypothesis in A. flavolineata would be the B chromosome being not perfectly metacentric, as this would require additional events of inversion or differential duplications or deletions between B arms. More intense amplification of DNA repeats on one of the B chromosome arms has been noticed for TEs such as Gypsy_17, Tc1_74, and Afmar2 (Palacios-Gimenez et al. 2014), and two satDNAs analyzed here (AflaSat07-36 and AflaSat40-218). This kind of event could have contributed to the emergence of the submetacentric B chromosome, which is currently prevalent in A. flavolineata. Notwithstanding, the evidence for L1 derivation of the B chromosome is still preliminary, as we all are still in the initial steps to disentangle the conundrum of B chromosome origin.

Altogether our results indicate that the B chromosome in A. flavolineata is unusually little enriched in repetitive DNAs, presumably because this B chromosome arose from the longest A chromosome, with a low proportion of C-heterochromatin, the most part of which was lost during the misdivision that yielded the B chromosome from the L1 autosome. The B chromosome is enriched in only a few repetitive elements, to a low extent, and the absence of B-specific satDNAs suggests that this B chromosome is young. This fact might be helpful in testing the L1-derivation hypothesis of the B chromosome, as a putatively young element could still conserve high similarity in gene content with its ancestor chromosome.

Data archiving

Genomes have been deposited at the Sequence Read Archive (SRA) under accession numbers SRX7784770–SRX7784772.