Novel gene rearrangement in the mitochondrial genome of Anastatus fulloi (Hymenoptera Chalcidoidea) and phylogenetic implications for Chalcidoidea

The genus Anastatus comprises a large group of parasitoids, including several biological control agents in agricultural and forest systems. The taxonomy and phylogeny of these species remain controversial. In this study, the mitogenome of A. fulloi Sheng and Wang was sequenced and characterized. The nearly full-length mitogenome of A. fulloi was 15,692 bp, compromising 13 protein-coding genes (PCGs), 2 rRNA genes, 22 tRNA genes and a control region (CR). The total A + T contents were 83.83%, 82.18%, 87.58%, 87.27%, and 82.13% in the whole mitogenome, 13 PCGs, 22 tRNA genes, 2 rRNA genes, and CR, respectively. The mitogenome presented negative AT skews and positive GC skews, except for the CR. Most PCGs were encoded on the heavy strand, started with ATN codons, and ended with TAA codons. Among the 3736 amino acid-encoding codons, TTA (Leu1), CGA (Arg), TCA (Ser2), and TCT (Ser2) were predominant. Most tRNAs had cloverleaf secondary structures, except trnS1, with the absence of a dihydrouridine (DHU) arm. Compared with mitogenomes of the ancestral insect and another parasitoid within Eupelmidae, large-scale rearrangements were found in the mitogenome of A. fulloi, especially inversions and inverse transpositions of tRNA genes. The gene arrangements of parasitoid mitogenomes within Chalcidoidea were variable. A novel gene arrangement was presented in the mitogenome of A. fulloi. Phylogenetic analyses based on the 13 protein-coding genes of 20 parasitoids indicated that the phylogenetic relationship of 6 superfamilies could be presented as Mymaridae + (Eupelmidae + (Encyrtidae + (Trichogrammatidae + (Pteromalidae + Eulophidae)))). This study presents the first mitogenome of the Anastatus genus and offers insights into the identification, taxonomy, and phylogeny of these parasitoids.

Most Anastatus (Hymenoptera: Chalcidoidea) species are parasitoids of numerous insect species, and several species of Anastatus have been evaluated as biological control agents for various pests in agricultural and forest systems 21,22 . These insects represent a large family with approximately 150 recognized species but remain largely unstudied 23 . The current morphology-based taxonomy of Anastatus is problematic due to early taxonomic confusion, low degrees of morphological differentiation, and morphological variation related to host and geographical origin [23][24][25] . Parasitoids within Chalcidoidea are some of the most diverse hymenopterous insects, exhibiting the characteristics of frequent gene rearrangement, high substitution rates, and a strong base composition bias in mitogenomes 26,27 . Although the mitogenomes of other hymenopterous insects have been applied for identification, evolution, and phylogeny, this is not the case for Anastatus parasitoids. To date, mitogenomes of Anastatus parasitoids are still not available in the NCBI database.
Next-generation sequencing (NGS) is an increasingly applied technology used to obtain large-scale genetic information. It provides the full-length sequences of insect mitogenomes practically and effectively. In the present study, we applied NGS to obtain the complete mitogenome of A. fulloi Sheng and Wang, which represented the first sequenced mitochondrial genome in the Anastatus genus. The mitogenome was characterized, including the structure of the mitogenome, gene organization, nucleotides, amino acid composition, codon usage, and tRNA secondary structure. Gene rearrangement was discussed between several mitogenomes of parasitoids and the ancestral insect. In addition, phylogenetic analyses were performed based on the 13 PCGs of 20 parasitoids within Chalcidoidea. This study provides information about the taxonomy and phylogeny of this special insect and unveils the phylogenetic position of Anastatus within Chalcidoidea.

Methods
Insect collection and DNA extraction. Adult specimens of A. fulloi were obtained from the Plant Protection Research Institute, Guangdong Academy of Agricultural Sciences, People's Republic of China. All specimens were preserved in 100% alcohol and stored at − 20 °C. Total genomic DNA was extracted by using a DNeasy tissue kit (Qiagen, Hilden, Germany) according to the manufacturer's protocols. Subsequently, total DNA was quantified and tested by a Qubit fluorometer with a dsDNA high-sensitivity kit (ThermoFisher, Foster City, CA, USA) and agarose gel (1%) electrophoresis.
Mitochondrial genome sequencing and assembly. At least 1 μg of DNA was used to construct the sequencing library using the TruSeq DNA Library Preparation kit according to standard protocols. The library was then sequenced by the Illumina HiSeq™4000 (Illumina, USA) platform with paired-end reads of 2 × 150 bases. A total of 4.79 Gb of clean data was obtained with a Q30 of 92.16%. SPAdes v3.10.1 (http:// bioinf. spbau. ru/ spades) 28 and A5-miseq v20150522 29 were used to assemble the clean data into contigs and scaffolding. The sequences with high sequencing coverage were extracted to identify the mitochondrial sequences by the NCBI NT library using BLASTN (BLAST v2.2.31+). MUMmer v3.1 30 was used to determine the contig positions and fill the gaps between contigs. Through Pilon v1.18 31 , the results were error-corrected to obtain the final mitochondrial sequence.
Genome annotation and analysis. The mitogenome of A. fulloi was annotated on MITOS (http:// mitos. bioinf. uni-leipz ig. de/ index. py). The secondary structure of tRNAs was also predicted by MITOS. Annotation results were confirmed by comparison with homologous sequences in the NCBI database, and then submitted to NCBI. A mitogenome map of A. fulloi was generated by mtviz (http:// pacosy. infor matik. uni-leipz ig. de/ mtviz/ mtviz).
The base composition and relative synonymous codon usage (RSCU) values of PCGs were calculated by MEGA 6.0. The AT and GC skews were counted using the following formulas: AT skew = (A − T)/(A + T) and GC skew = (G − C)/(G + C). The gene orders of parasitoid mitogenomes within Chalcidoidea were compared by CREx (http:// pacosy. infor matik. uni-leipz ig. de/ crex/ form# INFO) 32 .
All mitogenome information from 21 species was downloaded, including 19 parasitoids within Chalcidoidea and 2 outgroups. Together with the mitogenome of A. fulloi, they were used to extract the 13 PCGs through PhyloSuite v1.2.2 33 . These PCGs were aligned by MAFFT 7.149 34 and MACSE v. 2.03 35 . Conserved blocks were obtained using Gblocks v0.91b 36 . MrBayes on XSEDE and RAxML on XSEDE (CIPRES portal) were employed to construct the bayesian inference (BI) and maximum likelihood (ML) phylogenetic trees, respectively. Mod-elFinder was used to evaluate the best evolutionary model 37 . GTR + F + I + G4 was selected by the Bayesian information criterion (BIC). The MrBayes analyses ran as 4 independent Markov chains for 3 million generations, sampled every 1000 generations. A burn-in of 25% was used to generate the consensus tree. The RAxML analysis was performed with 1000 replicates of ultrafast likelihood bootstrap. FigTree v1.4.2 was employed to edit two phylogenetic trees.

Results and discussion
General features of the mitogenome. The mitochondrial sequence of A. fulloi has been submitted to GenBank with the accession number OK545741. The nearly complete mitogenome was 15,692 bp in length, within the range of other hymenopterous mitogenomes: from 15,137 (Idris sp.) to 20,370 bp (Trachelus iudaicus) 38,39 . We failed to obtain the complete CR using next-generation and Sanger sequencing, which may be attributed to the comparatively high A + T content and presence of repeat units in CR 19,40,41 . Therefore, the A. fulloi mitogenome contained 37 genes (13 protein-coding genes, 2 rRNA genes, and 22 tRNA genes) and a partial CR (Table 1, Fig. 1). This is a common gene set for most hymenopterous mitogenomes 17 18,20,44 . Generally, the total length of overlapping regions is smaller in size than intergenic regions in most hymenopterous insects 19,45,46 . In the mitogenome of A. fulloi, a total of 10 overlapping regions were found, ranging in size from 1 to 18 bp ( Table 1). The longest overlapping region was found between trnA and rrnL. The A. fulloi mitogenome also contained 24 intergenic spacers with a total length of 531 bp. These intergenic spacers range from 1 to 129 bp ( Table 1). The longest gene spacer was located between trnC and trnN. Notably, an overlap between atp6 and atp8 was a common feature of metazoan mitogenomes 47 and it was also found in A. fulloi. Moreover, an overlap between nad4 and nad4l widely occurred among mitogenomes of A. fulloi and other hymenopteran parasitoids 17,18,20 , which may be translated as a bicstron 48 .
Significant bias in nucleotide composition is typical in insect mitogenomes 19 and the nucleotide compositional bias of mitogenomes is usually assessed by non-strand specific (A + T content, G + C content) and strandspecific, namely strand asymmetry (ATskew, GC-skew) 49 . The nucleotides of the A. fulloi mitogenome comprise A (39.06%), T (44.77%), C (6.02%), and G (10.15%) ( Table 2). The A. fulloi mitogenome was highly biased towards A and T nucleotides with an A + T content of 83.83%. In addition, the mitogenome of A. fulloi had a negative AT skew (− 0.069) and a positive GC skew (0.255) ( Table 2). This indicates that the mitogenome of A. fulloi contains more T than A and more G than C, as reported in other mitogenomes of hymenopterous species 20,50 . www.nature.com/scientificreports/ This common pattern of base composition in mitogenome may be attributed to the highly asymmetric effects of transcription on mutagenesis, including unequal exposure of the strands to DNA damage and the differential chance for repair 51,52 . PCGs. The total length of 13 PCGs was 11,245 bp, accounting for 71.66% of the whole mitogenome. This set of PCGs is conserved in animal mitogenomes, with the exception of nematodes and a bivalve that lack atp8 15 . These PCGs range in size from 162 bp (atp8) to 1678 bp (nad5). The A + T content of all PCGs was 82.18% (Table 2). A remarkably high A + T content (94.05%) was found at the third codon sites of these PCGs, which may partly reflect the high bias towards A and T nucleotides in the mitogenome 19,53,54 . Meanwhile, the AT skew and GC skew of the PCGs were − 0.145 and 0.102, respectively. Of the 13 PCGs, 10 were encoded on the heavy strand, whereas cytb, nad2, and nad6 were encoded on the light strand. In insect mitogenomes, PCGs usually start with ATN codons (ATA, ATT, ATC, and ATG) and terminate with TAA or TAG [55][56][57] . However, unusual start-and termination codons were simultaneously and exclusively found, such as the start codons of TTG, CGA, GTG, and the incomplete stop codon of T [58][59][60] In the A. fulloi mitogenome, most PCGs started with ATN, including ATA (nad1-3, nad5, and cox1), ATT (atp8, cox2, cox3, and nad4l) and ATG (atp6 and nad4). However, nad6 used an atypical starting codon of AAC, which has also been found in Cheirotonus jansoni and Prosopocoilus gracilis 61,62 . All PCGs terminated with TAA except nad3 (TAG) and nad5 (incomplete codon T). Previous studies have inferred that the incomplete termination codon could be completed by posttranscriptional polyadenylation 15,63,64 . The codon usage of PCGs was assessed by the relative synonymous codon usage (RSCU)  www.nature.com/scientificreports/ value (Fig. 2). There is a clear preference for A or T in the third codon. Of the 3736 amino acid-encoding codons, TTA (Leu1), CGA (Arg), TCA (Ser2), and TCT (Ser2) were predominant. Codons such as CGC, AGC, and CTG were not presented.
tRNAs, rRNAs and the control region. The total length of 22 tRNAs was 1481 bp, accounting for 9.44% of the whole A. fulloi mitogenome ( Table 2). These tRNAs range from 60 to 71 bp, within the size range observed in hymenopteran parasitoids 17,27,42 (Table 1). Among these tRNAs, 12 tRNAs were coded in the heavy strand, and the remaining 10 tRNAs were identified in the light strand. These tRNAs had a high A + T content of 87.58%, a slightly negative AT skew value (− 0.016), and a positive GC skew value (0.261). Except for the absence of a dihydrouridine (DHU) arm in trnS1, most tRNAs had a cloverleaf secondary structure (Fig. 3). The loss of the DHU arm in trnS1 is normal in insects 5,65 . Two rRNAs, i.e., the small ribosomal RNA (rrnS) and large ribosomal RNA (rrnL), were 770 bp and 1366 bp in length, respectively. These lengths are similar to those of most reported hymenopterous insects 27,66 . These rRNAs were located on the heavy strand and were separated by trnA. They consisted of A (43.59%), T (43.68%), C (4.03%), and G (8.71%), with an A + T content of 87.27%. The AT skew and GC skew were − 0.001 and 0.367, respectively.
In insect mitogenomes, the control region, i.e., the A + T-rich region, is associated with replication and transcription 67,68 . This region is variable not only in size but also in base composition 69 . The CR was 3308 bp with an A + T content of 85.9% in Pteromalus puparum 70 but 578 bp with an A + T content of 93.6% in Spathius agrili 18 . In addition, repeat structures are usually detected in CR, which is also diverse in both type and size among insect mitogenomes 69 . The high A + T content and presence of repeat regions may result in the failure of obtaining CR sequence due to the inhibition of DNA polymerase 40,71 . Currently, the long and complex control region still presents challenges to obtaining complete mitogenomes in certain groups of parasitoids 19,20,27,41,72,73 . In this study, we failed to obtain the complete segment of the CR by next-generation sequencing or Sanger sequencing. The partial control region obtained for the A. fulloi mitogenome was 347 bp with an A + T content of 82.13% and was located between trnM and trnV.
Gene rearrangement. Mitochondrial gene rearrangements occur frequently in hymenopterous insects 74 , which are important clues to evolution and are regarded as valuable phylogenetic characters for these insects 26 . In this study, a comparison of gene-order data was conducted to examine the gene rearrangements. The gene orders of parasitoid mitogenomes assigned to 6 families are shown in Fig. 4. Species from different genera possessed unique gene orders. Compared with the gene order in the ancestral insect mitogenome, parasitoid mitogenomes exhibit large-scale rearrangement events for tRNA genes and PCGs, as reported in other studies of parasitoid mitogenomes 17,27,75 . In particular, all chalcidoid wasps exhibited an inversion gene block from nad5 to rrnL except Gonatocerus sp. Phylogenetically Gonatocerus sp. comes closer to the ancestor and agrees with www.nature.com/scientificreports/ the genetic order as well. Despite another relatively conserved gene block from cox1 to cox3, the remaining genes are highly rearranged. Although the gene orders of these mitogenomes from different families are variable, infrequent rearrangements were found between closely related species. Within Eupelmidae, mitogenomes of A. fulloi and Eupelmus sp. also exhibit different gene orders that are attributed to the rearrangement of a few tRNAs. Species from the same genus, such as species of Trichogramma and or Encyrtus, showed the same gene order in mitogenomes. In this study, mitochondrial gene rearrangements are diverse in chalcidoid wasps but occur infrequently in closely related taxa, which might be useful for phylogenetic analysis. CREx analysis demonstrated that the gene order of the mitogenome of A. fulloi is novel. Compared with the gene order in the ancestral insect mitogenome, the segment between cox1 and nad3 was relatively conserved except for the inversion of trnK, which is the same as Metaphycus eriococci, Encyrtus sasakii, and Chouioia cunea 76 (Fig. 5). It has been reported that the segment "trnE -trnF -nad5 -trnH -nad4 -nad4l trnT -trnP nad6 cytb" is conserved between Megraphragma and Philotrypesis 44,77 . However, in the mitogenome of A. fulloi, the segment "trnE -trnF -nad5 -trnH -nad4 -nad4l trnT -trnP nad6 cytb trnS2 -nad1 -trnL1 -rrnL" was inversed, and trnT -trnP have exchanged their positions. The segment "-trnV -rrnS CR trnI -trnQ trnM nad2 trnW" was highly rearranged, including the inversions of trnV, rrnS, trnM, nad2 and trnW, position exchanges of trnV and rrnS, trnM and trnI, and the transposition of trnQ (Fig. 5). In addition, trnC and trnY were transported to the segment "trnR trnN trnS1". The gene order of these tRNA genes was changed to "-trnS1 trnY trnN -trnC -trnR". Phylogenetic analyses. The phylogenetic relationships of 20 parasitoids within Chalcidoidea were analysed and are displayed in Fig. 6. Maximum likelihood (ML) and Bayesian inference (BI) phylogenetic trees were constructed based on nucleotide sequences of 13 PCGs of these mitogenomes in CIPRES 78 . Although some clades exhibited low support values, the same topological structures were found in two phylogenetic trees. Two species within Eupelmidae, A. fulloi and Eupelmus sp., were clustered together. Other species within the same families were grouped and separated from other parasitoids in different families. In Chalcidoidea, Eupelmidae is considered to be closely related to Encyrtidae, and to share several of the same features, such as an expanded acropleuron 79,80 . However, this family is not monophyletic, representing a grade rather than a clade 24,81,82 . Some species may show a close relationship to Pteromalidae 79,82 . In the present study, the phylogenetic relationship of parasitoids within Chalcidoidea can be presented as follows: Mymaridae + (Eupelmidae + (Encyrtidae + (Trichogrammatidae + (Pteromalidae + Eulophidae)))). This result is consistent with other reports 27,43 .