Introduction

Skipper butterflies (Lepidoptera: Hesperiidae) include approximately 4,000 species in 567 genera worldwide1 and account for a fifth of the world’s butterfly fauna2. Despite considerable efforts in recent years3,4,5, the higher-level phylogenetic relationships within the family Hesperiidae are still unsatisfactorily resolved. The taxonomic affinities of many genera are not conclusive, even at the subfamily level6, including Apostictopterus and Barca.

The taxonomic positions of the two monotypic genera Apostictopterus and Barca have been controversial. They were assigned to the Heteropterus group of the subfamily Hesperiinae close to the Astictopterus group in Evans’s classification7, while Chou8 assigned Apostictopterus to the tribe Astictopterini and Barca to the tribe Heteropterini. Since Higgins9, the Heteropterus group of Evans has widely been regarded as Heteropterinae at the subfamily level. In previous studies1,10, these two genera were both treated as members of the subfamily Heteropterinae. However, on the basis of morphological evidence, Warren et al.6 were more likely to place them in Hesperiinae.

The difficulty of morphologically based phylogenetic systematics has been shown, whereas molecular phylogeny has been contributing to the development of a more stable classification. Since mitochondria are characterised by maternal inheritance, a rapid evolutionary rate, and little or no genetic recombination, they have been extensively used in the field of genetics and evolutionary biology11,12,13,14. Insect mitochondrial genomes (mitogenomes) are typically compact circular molecules of 15–18 kb containing 37 genes, including 13 protein-coding genes (PCGs), 22 transfer RNAs (tRNAs), and two ribosomal RNAs (rRNAs)15,16. In addition, the mitogenome mostly contains a control region (an AT-rich region due to a high A + T content) that has a longer sequence than the other regions and embraces essential regulatory elements for transcription and replication16,17,18,19,20. However, this region cannot be well sequenced by high-throughput sequencing techniques, as the depth of coverage is strongly positively correlated with the GC content21.

Mitogenomes are data rich and relatively accessible source of information. Condamine21 had obtained promising results on the genus-level relationships of swallowtail butterflies using mitogenomes. Thus far, 30 complete or nearly complete mitogenomes of skippers have been sequenced. In this study, we sequenced two additional complete mitogenomes of A. fuliginosus and B. bicolor and then elucidated the composition of the genomes. Finally, we inferred the phylogenetic relationships from the 27 available mitogenomes within the Hesperiidae4,5,22,23,24,25,26. We did not use three mitogenomes. Polytremis jigongi and Polytremis nascens showed very low homology to the other species. There are two mitogenomes of Daimio tethys that are basically in line, so we randomly selected the one from Korea based on a computation-efficient strategy.

Results and discussion

Genome structure and organization

The complete mitogenomes of A. fuliginosus and B. bicolor are 15,417 bp and 15,574 bp (Fig. 1), respectively, which are similar to other hesperiid mitogenomes (Table 1). The organisations of A. fuliginosus and B. bicolor are shown in Table 1. Similar to most typical insect mitogenomes, these two species harbours 13 protein-coding genes (ATP6, ATP8, Cytb, COI-COIII, ND1-ND6, and ND4L), 22 transfer RNAs (tRNAs), two ribosomal RNAs (rRNA: lrRNA and srRNA), and an AT-rich region. These assembly units are identical to those of the other skippers, and the encoding protein genes’ ORF direction is the same as in most skippers. Both mitogenomes have 15 intergenic regions. The maximum intervals of A. fuliginosus and B. bicolor, both between trnQ with ND2, are 73 bp and 72 bp, respectively. Only a few genes (four PCGs, eight tRNAs, and two rRNAs) are from the N strand, and the remaining 23 genes (nine PCGs and 14 tRNAs) are from the J strand. The nucleotide composition of A. fuliginosus is A (40.1%), T (40.6%), C (11.8%), and G (7.4%); the AT nucleotide content is as high as 80.7%. In B. bicolor, the composition is A (40.0%), T (39.4%), C (12.9%), and G (7.7%); the AT nucleotide content is as high as 79.4%. In these two mitogenomes, the GC skew of two mitogenomes and the AT skew of A. fuliginosus are negatively biased, while the AT skew of B. bicolor has a slightly positive bias (Supplementary Material S1).

Figure 1
figure 1

Circular map of the mitogenomes of A. fuliginosus and B. bicolor.

Table 1 Organization of the A. fuliginosus and B. bicolor mitogenomes.

Protein-coding genes (PCGs)

The PCGs of the two mitogenomes encode a total of 3,730 (A. fuliginosus) and 3,731 (B. bicolor) amino acids, which account for 72.6% and 71.9% of A. fuliginosus and B. bicolor, respectively. All PCGs in both mitogenomes start with typical ATN codons, except for COI, which is initiated by CGA, as is common in Lepidoptera. Stop codons in the PCGs include two types: TAA or T. Though incomplete stop codons always appear in lepidopteran mitogenomic PCGs, translation will not be affected at all because the codons will be automatically filled by added As during the transcription process27. We calculated the relative synonymous codon usage (RSCU) of the PCGs in the two mitogenomes (Table 2). According to the RSCU analyses, TTT (F), ATT (I), TTA (L) and ATA (M) were the four most frequently used codons. In both species, leucine, isoleucine, phenylalanine and serine are the most frequent PCG amino acids (Fig. 2).

Table 2 Codon number and RSCU in the A. fuliginosus and B. bicolor mitochondrial PCGs.
Figure 2
figure 2

The relative synonymous codon usage (RSCU) in the mt-genomes of A. fuliginosus and B. bicolor.

Ribosomal RNA and Transfer RNA genes

The two rRNA genes (lrRNA, srRNA) encoding the small and large ribosomal subunits are located between trnL(CUN) and trnV and between trnV and the AT-rich region. The lrRNA and srRNA lengths are 1,415 and 774 bp, respectively, in A. fuliginosus, and are 1,419 and 773 bp in B. bicolor.

Both A. fuliginosus and B. bicolor have 22 tRNAs with sizes ranging from 62–75 bp, which are systematically embedded in each PCG, rRNA and AT-rich region. The total length of 22 tRNAs is 1,475 bp in A. fuliginosus and 1,475 bp in B. bicolor. Among the 22 tRNAs, 14 are encoded on the J strand and the remaining eight on the N strand, which is in accord with the other lepidopteran mitogenomes28. Most tRNA genes were folded into a cloverleaf secondary structure using MITOS, except for trnS(AGN), which lacks the DHU arm both in A. fuliginosus and B. bicolor (Supplementary Material 2). In many insects, an ancestral status that lacks the DHU stem of trnS(AGN) has been demonstrated29. In addition, the number of bases in the dihydrouridine loop ranges from 4 to 8 bp, which is not uniform because the DHU stem is highly variable30.

Overlapping sequences, intergenic spacers and the control region

There are nine gene overlaps in A. fuliginosus and eight in B. bicolor, with sizes ranging from 1 to 8 bp. The maximum overlap of the two mitogenomes are located between trnW and trnC (Table 1). The length of the common overlap between ATP6 and ATP8, which is widespread in hesperiid mitogenomes18,31,32, is 7 bp both in A. fuliginosus and B. bicolor.

The intergenic spacers of these two skippers are distributed among 15 regions, and their total lengths are 178 bp in A. fuliginosus and 135 bp in B. bicolor. Most of the intergenic spacers are not more than 20 bp. In these two mitogenomes, the longest, but not conserved, spacing sequence, whose position is similar to that in other hesperiid mitogenomes, is located between trnQ and ND2. This is consistent with this spacer probably arising in the process of gene rearrangements23.

The control region is also called the AT-rich region because it is typically characterised by a high AT content. Moreover, the proportion of the AT content is as high as 94.6% in A. fuliginosus and 92% in B. bicolor. The control regions, the longest region of noncoding sequences that is located between the srRNA and trnM, are 407 bp and 614 bp in A. fuliginosus and B. bicolor, respectively. We found one dinucleotide repeat (TA)55 in A. fuliginosus and two dinucleotide repeats (TA)36 and (AT)54 in B. bicolor. Furthermore, we found a long tandem repeat of 30 bp (AAATAAAAAATTAAAATAATTATTTTAATT) in A. fuliginosus and a tandem repeat length of 18 bp (TAAAAAAATAATTATTTT) in B. bicolor. There was also a structure in the AT-rich region of both species with the poly-T stretch in a position close to the srRNA. Several microsatellite-like A/T sequences following the motif ATTTA in the control region were found in A. fuliginosus and B. bicolor, which were also discovered in the other skipper mitogenomes33. Moreover, our predicted results showed that there are two stem-loop structures in A. fuliginosus and three stem-loop structures in B. bicolor (Fig. 3). Many studies have shown that the motif ATAGA close to the 5ʹ-end of srRNA is greatly conserved23,34. This also exists in A. fuliginosus and B. bicolor.

Figure 3
figure 3

Predicted structural elements in the control region of A. fuliginosus and B. bicolor.

Phylogenetic analyses

Our datasets included 29 skippers for 14,715 nucleotides after removing ambiguous regions. Different strategies obtained almost the same results (see below); here, we present the results based on the PRT dataset as a basis for subsequent analyses. 16 best-fitting partitioning schemes (Supplementary Material S3) were determined by PartitionFinder with an initial subset of 63 possible partitions based on the PRT dataset.

Similar topologies were inferred from phylogenetic analyses with MrBayes and IQ-TREE (Fig. 4). Six major clades were recovered: Coeliadinae, Euschemoninae, Eudaminae, Pyrginae, Heteropterinae, and Hesperiinae including subclade A, A. fuliginosus and B. bicolor, most of which agree with previous studies1,3,6,10. Coeliadinae is sister to the remaining subfamilies; the systematic positions of Euschemoninae and Eudaminae are confirmed, and Euschemoninae is the sister to all other skippers except Coeliadinae. Pyrginae, containing only four tribes (Erynnini, Pyrgini, Celaenorrhinini and Tagiadini), is recovered as monophyletic with weak support. Hesperiinae is obtained as monophyletic.

Figure 4
figure 4

Phylogenetic tree using BI method based on PRT dataset. Numbers at node indicated posterior probabilities (PP) and bootstrap value (BS) based on ML analyses were also given. Dot on nodes means this branch: PP/BS = 1/100.

In the phylogenetic tree, A. fuliginosus and B. bicolor formed a strongly supported subclade (Clade A); this subclade branches after Heteropterinae and is followed by Hesperiinae with high support. Our results do not agree with placing them in the subfamily Heteropterinae1,10. We thus tentatively assign these two genera to the subfamily Hesperiinae. Previous studies have inferred a close relationship among Heteroptinae, Trapezitinae and Hesperiinae, but the sister relationships were uncertain3,6, and none of these studies sampled Apostictopterus and Barca. In this study, we were unable to include Trapezitinae to test for close relationships with Hesperiinae along with Apostictopterus and Barca, as no mitogenome is yet available. Hence, more samples in Trapezitinae are needed to confirm this hypothesis and clarify their systematic positions.

The phylogenetic analyses based on four datasets (PRT, PCGC, PCGD and PCGR) using two methods revealed very similar topologies except for the phylogenetic position of Eudaminae and Pyrginae. In the BI and ML analyses from different datasets, the topologies were largely congruent except for three strategies with little discrepancy. As many studies have concluded, the mitogenome can provide robust and stable phylogenetic analyses. The result from the PCGR dataset showed that Eudaminae branched after Euschemoninae in the BI analyses. In the ML analyses, however, the topologies based on the PCGC and PCGD datasets revealed that Eudaminae nested within Pyrginae (Supplementary Material S4), suggesting that Pyrginae is polyphyletic. Above all, the monophyly of Pyrginae and Eudaminae remains unresolved in our analyses, and more evidence is needed to address this issue.

Materials and Methods

Sample collection and DNA extraction

The adult specimen of A. fuliginosus was collected in Linzhi, Tibet Autonomous Region, China. The adult B. bicolor specimen was obtained in Weixi Lisu Autonomous County, Yunnan Province, China. Two or three legs from a single specimen were used to extract the genomic DNA using the HiPure Insect DNA Kit (Magen, China) following the manufacturer’s instructions.

Primers, PCR, and cloning

For amplification, the complete mitogenomes were divided into 27 overlapping fragments. The primers were mainly taken from Kim et al.23 except for SF2, SF10, SF18, SF22 and SF27, which are newly designed (Supplementary Material S5). Due to the instability of the AT-rich region, we cloned this fragment after amplification and subsequent sequencing. For cloning, we referred to Fan et al.35.

We amplified all of the mitogenome but AT-rich regions using SuperMix (Transgene, China) via the following protocol: initial denaturation for 2 min at 94 °C, followed by 35 cycles of denaturation for 30 s at 94 °C, annealing for 45 s at 40–50 °C, and extension for 1 min at 72 °C, and a final extension step at 72 °C for 10 min. For the AT-rich region, we used KOD high-fidelity thermostable DNA polymerase (Takara, Japan) to improve the accuracy of the amplification and employed the following PCR conditions: initial denaturation of 2 min at 94 °C, followed by 35 cycles of 10 s at 98 °C, annealing for 45 s at 42 °C, and extension for 1 min at 68 °C, and a final extension at 72 °C for 10 min.

Sequence analysis and annotation

We assembled and proof-read the sequences using the software Geneious v7.1.436. PCGs were identified by finding the ORFs on the NCBI website (https://www.ncbi.nlm.nih.gov/orffinder/) with the invertebrate mitochondrial genetic codes. The tRNAs and rRNAs were identified using the MITOS Web Server (http://mitos.bioinf.uni-leipzig.de/index.py)37. Moreover, to confirm the accuracy of the boundaries of different genes, 37 genes were aligned using ClustalW in MEGA v7.0.238 and manual inspection. The nucleotide composition statistics and relative synonymous codon usage (RSCU) were calculated using MEGA v7.0.2. The AT skew and GC skew39 values used for measuring the deviation of the base were calculated by the following formulas: AT skew = (A − T)/(A + T); GC skew = (G − C)/(G + C). The circular maps were drawn by CGView Server (http://stothard.afns.ualberta.ca/cgview_server/)40. The tandem repeats of the control region were identified with the Tandem Repeats Finder on-line server (http://tandem.bu.edu/trf/trf.html)15. Stem loop structures of the AT-rich region were predicted by DNAMAN. The two complete mitogenomes were deposited in GenBank with accession numbers MH985707 and MH985708.

Phylogenetic analysis

We downloaded 33 available lepidopteran mitogenomes from GenBank, including 27 Hesperiidae, three Papilionidae and three Geometridae. The species used in this study are listed in Table 3. Each of the 13 PCGs was aligned individually using the software MAFFT V7.31341 with the G-INS-i strategy. Each of the two rRNAs was aligned separately using the Q-INS-i strategy through the MAFFT V7 online alignment server (https://mafft.cbrc.jp/alignment/server/)42. We removed gaps and ambiguous sites from the 13 PCGs by using the Gblocks V0.9143 online server (http://molevol.cmima.csic.es/castresana/Gblocks_server.html) with default settings.

Table 3 List of butterfly species analyzed in this paper with their respective GenBank accession numbers.

To compare the phylogenetic signal information of the different dataset combinations, four datasets were used: 1) PCGD: the 13 complete PCGs with the 3rd codon removed; 2) PCGC: the 13 complete PCGs; 3) PRT: the 13 complete PCGs, two rRNAs and 22 tRNAs; and 4) PCGR: two rRNAs and 13 PCGs with the 3rd codon removed. We employed PartitionFinder V2.1.144 to identify the best partitioning strategies under the Bayesian information criterion (BIC). Maximum likelihood (ML) analyses were performed on the IQ-TREE web online server (http://iqtree.cibiv.univie.ac.at/)45 with 1000 ultrafast bootstraps (UFBS) to estimate the branch support. The best-fit models produced by ModelFinder46 implemented in IQ-tree. The Bayesian inference (BI) analyses were performed using MrBayes V3.2.6 on the CIPRES Science Gateway 3.347. We used reversible-jump MCMC to allow sampling across all substitution rate models instead of specifying one substitution model, as suggested by PartitionFinder in BI analysis. Four Markov chains (one cold and three heated chains) were run simultaneously for 1 × 107 generations with sampling every 1,000 generations. We examined the average standard change of the split frequencies in Tracer V1.748 to determine the values falling below 0.01. We discarded the first 25% of the sampled trees as burn-in. The remaining trees were then used to calculate the posterior probabilities (PP) under the majority rule consensus.