The complete mitochondrial genomes of two skipper genera (Lepidoptera: Hesperiidae) and their associated phylogenetic analysis

The systematic positions of two hesperiid genera, Apostictopterus and Barca (Lepidoptera: Hesperiidae), remain ambiguous. We sequenced and annotated the two mitogenomes of Apostictopterus fuliginosus and Barca bicolor and inferred the phylogenetic positions of the two genera within the Hesperiidae based on the available mitogenomes. The lengths of the two circular mitogenomes of A. fuliginosus and B. bicolor are 15,417 and 15,574 base pairs (bp), respectively. These two mitogenomes show similar AT skew, GC skew, codon usage and nucleotide bias of AT: the GC skew of the two species is negative, and the AT skew of A. fuliginosus is negative, while the AT skew of B. bicolor is slightly positive. The largest intergenic spacer is located at the same position between trnQ and ND2 in A. fuliginosus (73 bp) and B. bicolor (72 bp). Thirteen protein-coding genes (PCGs) start with ATN codons except for COI, which starts with CGA. The control regions of both mitogenomes possess a long tandem repeat, which is 30 bp long in A. fuliginosus, and 18 bp in B. bicolor. Bayesian inference and maximum likelihood methods were employed to infer the phylogenetic relationships, which suggested that A. fuliginosus and B. bicolor belong in the subfamily Hesperiinae.

region cannot be well sequenced by high-throughput sequencing techniques, as the depth of coverage is strongly positively correlated with the GC content 21 .
Mitogenomes are data rich and relatively accessible source of information. Condamine 21 had obtained promising results on the genus-level relationships of swallowtail butterflies using mitogenomes. Thus far, 30 complete or nearly complete mitogenomes of skippers have been sequenced. In this study, we sequenced two additional complete mitogenomes of A. fuliginosus and B. bicolor and then elucidated the composition of the genomes. Finally, we inferred the phylogenetic relationships from the 27 available mitogenomes within the Hesperiidae 4,5,22-26 . We did not use three mitogenomes. Polytremis jigongi and Polytremis nascens showed very low homology to the other species. There are two mitogenomes of Daimio tethys that are basically in line, so we randomly selected the one from Korea based on a computation-efficient strategy.

Results and discussion
Genome structure and organization. The complete mitogenomes of A. fuliginosus and B. bicolor are 15,417 bp and 15,574 bp ( Fig. 1), respectively, which are similar to other hesperiid mitogenomes ( Table 1). The organisations of A. fuliginosus and B. bicolor are shown in Table 1. Similar to most typical insect mitogenomes, these two species harbours 13 protein-coding genes (ATP6, ATP8, Cytb, COI-COIII, ND1-ND6, and ND4L), 22 transfer RNAs (tRNAs), two ribosomal RNAs (rRNA: lrRNA and srRNA), and an AT-rich region. These assembly units are identical to those of the other skippers, and the encoding protein genes' ORF direction is the same as in most skippers. Both mitogenomes have 15 intergenic regions. The maximum intervals of A. fuliginosus and  B. bicolor, both between trnQ with ND2, are 73 bp and 72 bp, respectively. Only a few genes (four PCGs, eight tRNAs, and two rRNAs) are from the N strand, and the remaining 23 genes (nine PCGs and 14 tRNAs) are from the J strand. The nucleotide composition of A. fuliginosus is A (40.1%), T (40.6%), C (11.8%), and G (7.4%); the AT nucleotide content is as high as 80.7%. In B. bicolor, the composition is A (40.0%), T (39.4%), C (12.9%), and G (7.7%); the AT nucleotide content is as high as 79.4%. In these two mitogenomes, the GC skew of two mitogenomes and the AT skew of A. fuliginosus are negatively biased, while the AT skew of B. bicolor has a slightly positive bias (Supplementary Material S1).

Protein-coding genes (PCGs).
The PCGs of the two mitogenomes encode a total of 3,730 (A. fuliginosus) and 3,731 (B. bicolor) amino acids, which account for 72.6% and 71.9% of A. fuliginosus and B. bicolor, respectively. All PCGs in both mitogenomes start with typical ATN codons, except for COI, which is initiated by CGA, as is common in Lepidoptera. Stop codons in the PCGs include two types: TAA or T. Though incomplete stop codons always appear in lepidopteran mitogenomic PCGs, translation will not be affected at all because the codons will be automatically filled by added As during the transcription process 27 . We calculated the relative synonymous codon usage (RSCU) of the PCGs in the two mitogenomes (Table 2). According to the RSCU analyses, TTT (F), ATT (I), TTA (L) and ATA (M) were the four most frequently used codons. In both species, leucine, isoleucine, phenylalanine and serine are the most frequent PCG amino acids (Fig. 2).  28 . Most tRNA genes were folded into a cloverleaf secondary structure using MITOS, except for trnS (AGN) , which lacks the DHU arm both in A. fuliginosus and B. bicolor (Supplementary Material 2). In many insects, an ancestral status that lacks the DHU stem of trnS (AGN) has been demonstrated 29 . In addition, the number of bases in the dihydrouridine loop ranges from 4 to 8 bp, which is not uniform because the DHU stem is highly variable 30 . Overlapping sequences, intergenic spacers and the control region. There are nine gene overlaps in A. fuliginosus and eight in B. bicolor, with sizes ranging from 1 to 8 bp. The maximum overlap of the two mitogenomes are located between trnW and trnC ( Table 1). The length of the common overlap between ATP6 and ATP8, which is widespread in hesperiid mitogenomes 18 two mitogenomes, the longest, but not conserved, spacing sequence, whose position is similar to that in other hesperiid mitogenomes, is located between trnQ and ND2. This is consistent with this spacer probably arising in the process of gene rearrangements 23 .

Ribosomal RNA and
The control region is also called the AT-rich region because it is typically characterised by a high AT content. Moreover, the proportion of the AT content is as high as 94.6% in A. fuliginosus and 92% in B. bicolor. The control regions, the longest region of noncoding sequences that is located between the srRNA and trnM, are 407 bp and 614 bp in A. fuliginosus and B. bicolor, respectively. We found one dinucleotide repeat (TA) 55 in A. fuliginosus and two dinucleotide repeats (TA) 36 and (AT) 54 in B. bicolor. Furthermore, we found a long tandem repeat of 30 bp (AAATAAAAAATTAAAATAATTATTTTAATT) in A. fuliginosus and a tandem repeat length of 18 bp (TAAAAAAATAATTATTTT) in B. bicolor. There was also a structure in the AT-rich region of both species with the poly-T stretch in a position close to the srRNA. Several microsatellite-like A/T sequences following the motif ATTTA in the control region were found in A. fuliginosus and B. bicolor, which were also discovered in the other skipper mitogenomes 33 . Moreover, our predicted results showed that there are two stem-loop structures in A. fuliginosus and three stem-loop structures in B. bicolor (Fig. 3). Many studies have shown that the motif ATAGA close to the 5ʹ-end of srRNA is greatly conserved 23,34 . This also exists in A. fuliginosus and B. bicolor. Phylogenetic analyses. Our datasets included 29 skippers for 14,715 nucleotides after removing ambiguous regions. Different strategies obtained almost the same results (see below); here, we present the results based on the PRT dataset as a basis for subsequent analyses. 16 best-fitting partitioning schemes (Supplementary Material S3) were determined by PartitionFinder with an initial subset of 63 possible partitions based on the PRT dataset.
Similar topologies were inferred from phylogenetic analyses with MrBayes and IQ-TREE (Fig. 4)  and Euschemoninae is the sister to all other skippers except Coeliadinae. Pyrginae, containing only four tribes (Erynnini, Pyrgini, Celaenorrhinini and Tagiadini), is recovered as monophyletic with weak support. Hesperiinae is obtained as monophyletic.
In the phylogenetic tree, A. fuliginosus and B. bicolor formed a strongly supported subclade (Clade A); this subclade branches after Heteropterinae and is followed by Hesperiinae with high support. Our results do not agree with placing them in the subfamily Heteropterinae 1,10 . We thus tentatively assign these two genera to the subfamily Hesperiinae. Previous studies have inferred a close relationship among Heteroptinae, Trapezitinae and Hesperiinae, but the sister relationships were uncertain 3,6 , and none of these studies sampled Apostictopterus and Barca. In this study, we were unable to include Trapezitinae to test for close relationships with Hesperiinae  The phylogenetic analyses based on four datasets (PRT, PCGC, PCGD and PCGR) using two methods revealed very similar topologies except for the phylogenetic position of Eudaminae and Pyrginae. In the BI and ML analyses from different datasets, the topologies were largely congruent except for three strategies with little discrepancy. As many studies have concluded, the mitogenome can provide robust and stable phylogenetic analyses. The result from the PCGR dataset showed that Eudaminae branched after Euschemoninae in the BI analyses. In the ML analyses, however, the topologies based on the PCGC and PCGD datasets revealed that Eudaminae nested within Pyrginae (Supplementary Material S4), suggesting that Pyrginae is polyphyletic. Above all, the monophyly of Pyrginae and Eudaminae remains unresolved in our analyses, and more evidence is needed to address this issue.

Materials and Methods
Sample collection and DNA extraction. The adult specimen of A. fuliginosus was collected in Linzhi, Tibet Autonomous Region, China. The adult B. bicolor specimen was obtained in Weixi Lisu Autonomous County, Yunnan Province, China. Two or three legs from a single specimen were used to extract the genomic DNA using the HiPure Insect DNA Kit (Magen, China) following the manufacturer's instructions.
Primers, PCR, and cloning. For amplification, the complete mitogenomes were divided into 27 overlapping fragments. The primers were mainly taken from Kim et al. 23 except for SF2, SF10, SF18, SF22 and SF27, which are newly designed (Supplementary Material S5). Due to the instability of the AT-rich region, we cloned this fragment after amplification and subsequent sequencing. For cloning, we referred to Fan et al. 35 .
We amplified all of the mitogenome but AT-rich regions using SuperMix (Transgene, China) via the following protocol: initial denaturation for 2 min at 94 °C, followed by 35 cycles of denaturation for 30 s at 94 °C, annealing for 45 s at 40-50 °C, and extension for 1 min at 72 °C, and a final extension step at 72 °C for 10 min. For the AT-rich region, we used KOD high-fidelity thermostable DNA polymerase (Takara, Japan) to improve the accuracy of the amplification and employed the following PCR conditions: initial denaturation of 2 min at 94 °C, followed by 35 cycles of 10 s at 98 °C, annealing for 45 s at 42 °C, and extension for 1 min at 68 °C, and a final extension at 72 °C for 10 min.  the 13 complete PCGs, two rRNAs and 22 tRNAs; and 4) PCGR: two rRNAs and 13 PCGs with the 3 rd codon removed. We employed PartitionFinder V2.1.1 44 to identify the best partitioning strategies under the Bayesian information criterion (BIC). Maximum likelihood (ML) analyses were performed on the IQ-TREE web online server (http://iqtree.cibiv.univie.ac.at/) 45 with 1000 ultrafast bootstraps (UFBS) to estimate the branch support. The best-fit models produced by ModelFinder 46 implemented in IQ-tree. The Bayesian inference (BI) analyses were performed using MrBayes V3.2.6 on the CIPRES Science Gateway 3.3 47 . We used reversible-jump MCMC to allow sampling across all substitution rate models instead of specifying one substitution model, as suggested by PartitionFinder in BI analysis. Four Markov chains (one cold and three heated chains) were run simultaneously for 1 × 10 7 generations with sampling every 1,000 generations. We examined the average standard change of the split frequencies in Tracer V1.7 48 to determine the values falling below 0.01. We discarded the first 25% of the sampled trees as burn-in. The remaining trees were then used to calculate the posterior probabilities (PP) under the majority rule consensus.