Extensive gene rearrangements in the mitochondrial genomes of two egg parasitoids, Trichogramma japonicum and Trichogramma ostriniae (Hymenoptera: Chalcidoidea: Trichogrammatidae)

Animal mitochondrial genomes usually exhibit conserved gene arrangement across major lineages, while those in the Hymenoptera are known to possess frequent rearrangements, as are those of several other orders of insects. Here, we sequenced two complete mitochondrial genomes of Trichogramma japonicum and Trichogramma ostriniae (Hymenoptera: Chalcidoidea: Trichogrammatidae). In total, 37 mitochondrial genes were identified in both species. The same gene arrangement pattern was found in the two species, with extensive gene rearrangement compared with the ancestral insect mitochondrial genome. Most tRNA genes and all protein-coding genes were encoded on the minority strand. In total, 15 tRNA genes and seven protein-coding genes were rearranged. The rearrangements of cox1 and nad2 as well as most tRNA genes were novel. Phylogenetic analysis based on nucleotide sequences of protein-coding genes and on gene arrangement patterns produced identical topologies that support the relationship of (Agaonidae + Pteromalidae) + Trichogrammatidae in Chalcidoidea. CREx analysis revealed eight rearrangement operations occurred from presumed ancestral gene order of Chalcidoidea to form the derived gene order of Trichogramma. Our study shows that gene rearrangement information in Chalcidoidea can potentially contribute to the phylogeny of Chalcidoidea when more mitochondrial genome sequences are available.

Comparative studies may also contribute to understanding the forces that drive gene rearrangement. Gene rearrangements have been hypothesized to be correlated with parasitic life histories in the Hymenoptera 20,21 and to some characteristics, such as body size and developmental time 22 . Frequent gene rearrangements have been observed in apocritan Hymenoptera based on broad examinations of whole or partial mitochondrial genome sequences 13,23 . Moreover, it has been reported that gene rearrangement was accelerated in the mitochondrial genomes of Apocrita 7,24,25 . However, no tight association was found between an increased rate of mitochondrial gene arrangement and the evolution of parasitism in an analysis of the characterization of 67 mitochondrial tRNA gene rearrangements in the Hymenoptera 16 .
Gene rearrangement patterns in the Hymenoptera are usually complicated and variable compared with those in most other insect orders 24,25 . Rearrangement of mitochondrial gene can be described by transposition, inversion, inverse transposition and TDRL (tandem duplication random loss) ( Fig. 1) 3,26 . Bernt et al. 27 introduced a movement of TDRL to describe the duplication of multiple contiguous genes and the successive random loss of one of the two copies 27 . In the Hymenoptera, rearrangements of tRNA genes usually occurred around the five gene clusters (Fig. 1). Inversion of trnH from the minority strand (heavy strand in mammal mitochondrial genomes) to the majority strand (light strand in mammal mitochondrial genomes, on which more genes are encoded) has also occurred multiple times in the Braconidae (Hymenoptera) 26 . The rate of rearrangement of protein-coding genes is lower than that of tRNAs in the Hymenoptera. Rearrangement of protein-coding gene has been reported in limited species of Agaonidae (Chalcidoidea) 14 , Aulacidae (Evanioidea) 28 , Trigonalyidae (Trigonalyoidea) 29 , Pteromalidae (Chalcidoidea) 30 , Ichneumonidae (Ichneumonoidea) 16 , Braconidae (Ichneumonoidea) 15 and Bethylidae (Chrysidoidea) 13 .
In the Chalcidoidea (Hymenoptera: Apocrita), unusually high rates of gene rearrangement, including not only tRNA genes but also protein-coding genes, have been found 14,30 . Combined with the diverse lifestyles among species of this group, this high rearrangement rate provides suitable materials for examining the evolution of gene rearrangement. Presently, only a few complete or partial mitochondrial genomes are known from Chalcidoidea, including those from Megaphragma 31 , Nasonia 30 , and Philotrypesis 14 . The Trichogrammatidae (Chalcidoidea) are small egg parasitoids with a short developmental duration and are one of the most important groups used in the biological control of insect pests. The parasitoids in this family can parasitize the eggs of about 10 orders. However, no complete mitochondrial genome has previously been sequenced from members of this family except for the mitochondrial genome sequence from Megaphragma 31 . Increasing knowledge of the mitochondrial genomes of egg parasitoids will provide further insight into their higher-level phylogeny and the evolution of their life histories.
In the study, we sequenced two mitochondrial genomes from Trichogramma ostriniae and Trichogramma japonicum. We found novel and extensive gene rearrangements in both species compared with the ancestral insect mitochondrial genome. Phylogenetic relationships within the Chalcidoidea were reconstructed using mitochondrial genome sequences and gene arrangement patterns.

Results and Discussion
Genome structure. The complete mitochondrial genome of T. japonicum (GenBank accession number: KU577436) and T. ostriniae (GenBank accession number: KU577437) were determined, with lengths of 15,962 bp and 16,472 bp, respectively. The sizes were well within the range found in other completely sequenced hymenopteran insects (from 15,137 bp in Idris sp. to 19,339 bp in Cephus cinctus) 32,33 (Table S1). All typical animal mitochondrial genes and control regions were identified in both circular genomes (Table 1).
In the mitochondrial genome of T. japonicum, a total of 547 bp of intergenic nucleotides ranging from 1 to 81 bp were found in 17 locations. The longest intergenic spacer (81 bp) was found between atp8 and trnD. In the mitochondrial genome of T. ostriniae, there were 870 bp intergenic spacer sequence distributed among 19 locations with lengths from 1 to 171 bp. The longest intergenic spacer (171 bp) was located between trnF and trnI. Long intergenic spaces have been identified in other insect mitochondrial genomes 13,34 and were considered as possible results of gene rearrangement 29 . Overlapping genes are very common in arthropod mitochondrial genomes [34][35][36] . In the mitochondrial genome of T. japonicum, a total of 19 bp of overlapping nucleotides were detected with a length from 1 to 7 bp, while in that of T. ostriniae there were 18 bp shared nucleotides in total, also ranging from 1 to 7 bp. In both species, the overlapping regions were found in the same five locations, i.e., atp6-atp8, nad4-nad4l, trnT-trnP, trnS2-nad1 and trnV-trnR. The other 10 pairs of genes in T. japonicum and 13 pairs of genes in T. ostriniae were directly adjacent, without overlapping or intergenic nucleotides.
The sequences of both mitochondrial genomes are biased in nucleotide composition [(A + T)% > (G + C)%] (Table S2), which is common in mitochondrial genomes of suborder Apocrita (Hymenoptera) 19,37,38 . The parameters of AT skew and GC skew are frequently used to reveal the nucleotide-compositional behavior of mitochondrial genomes [39][40][41] . In both species, the AT skews of the majority strand were positive, while GC skews were negative, which indicated that the two mitochondrial genomes contained more A than T and more C than G nucleotides (Table S2), as reported for most hymenopteran insects 42 (Table S1). Transfer RNA and ribosomal RNA genes. In total, 22 tRNA genes were interspersed throughout the Trichogramma mitochondrial genomes, of which four were coded on the majority strand while 18 were coded on the minority strand. The tRNA genes ranged from 59 bp (trnS1 in T. japonicum) to 70 bp (trnK in T. japonicum), well within the range observed in other insects (Table 1). All tRNA sequences can be folded into the canonical cloverleaf secondary structure, except for trnS1 which lacked the dihydrouridine (DHU) arm. A lack of the DHU arm in trnS1 was found in the mitochondrial genomes of most insects 1,43 and other metazoans 44 . Variations in the lengths of the variable loop, DHU and TΨC arms result in the different sizes observed in the tRNA sequences 45 . In total four mismatches (U-U in trnY, trnW, trnG and trnC) were found in T. japonicum and five (U-U in trnY, trnW, trnG, trnC and trnN) in T. ostriniae. Mismatches were located mostly in the DHU and anticodon stems ( Figure S1). As with other insect mitochondrial genome sequences, the large and small ribosomal RNA genes (rrnL and rrnS) were encoded by the minority strand in the same location (between trnL1-trnA and trnG-trnV). In T. japonicum, the length of the rrnS gene was 790 bp with an A + T content of 87.72%, while the rrnL gene was 1400 bp with an A + T content of 88.36%. In T. ostriniae the length of the rrnS gene was 775 bp with an A + T content of 88.52%, while the rrnL gene was 1367 bp with an A + T content of 88.00%.

Gene Strand
Protein-coding genes and codon usage patterns. In both the T. japonicum and T. ostriniae mitochondrial genomes, 11 of 13 protein-coding genes were encoded by the minority strand, while two (nad6 and cob) were encoded by the majority strand. All homologous protein-coding genes from the two species had the same length, except for nad3, nad6 and nad5 (Table 1).
In the mitochondrial genome of T. japonicum, the total length of the protein-coding genes was 11,202 bp, accounting for 70.18% of the entire genome. The average A + T content of the 13 protein-coding genes was 83.08%, ranging from 76.04% (cox1) to 90.80% (nad2) for individual genes. In the mitochondrial genome of T. ostriniae, the total length of protein-coding genes was 11,190 bp, accounting for 67.93% of the entire genome. The average A + T content of the 13 protein-coding genes was 83.25%, ranging from 76.37% (cox1) to 91.30% (nad2) for individual genes (Table S2).
The predicted initiation codons are ATN, as in most other insect mitochondrial genomes 37,46 . In some cases, a given gene may have different start codons in different species. There were seven genes (nad2, nad3, nad1, nad4L, nad5, cox2 and atp8) starting with ATG and five genes (cox1, cob, nad4, atp6 and cox3) starting with ATA in both genomes. In T. ostriniae, nad6 started with ATG, but in T. japonicum it started with ATA. All protein-coding genes terminated at the most common stop codon, TAA, in both genomes, except for nad4l in T. ostriniae, which stopped with TAG, and nad2, which stopped with T and TA in T. ostriniae and T. japonicum, respectively.
Codons with high A/T content were preferred in these two species, as in most insect mitochondrial genomes 47 . In both species of this study, Ala, Gly, Leu, Pro, Arg, Ser, Thr and Val were the most frequently used amino acids, and UUA (Leu) had the highest relative synonymous codon usage (RSCU) ( Table S3). All remaining codons with RSCU > 2.00 preferred A/T in the third codon position.

Control region.
Complete control regions were found in both species. The length of the control region was 593 bp in T. japonicum and 830 bp in T. ostriniae, which was well within the range reported in insects 21,48 . The control region in both species was flanked by trnW and trnM with high A + T content (90.99% in T. japonicum and 89.03% in T. ostriniae).
The control region is believed to function in the initiation of replication and control of transcription of the mitochondrial genome 49 . This region is usually characterized by five conserved elements 8,50 as reported in some insect mitochondrial genomes 15 . All of those elements could be identified in the mitochondrial genomes of Trichogramma, such as (1) a polyT stretch at the 5′ end of the control region; (2) a [TA(A)]n-like stretch following the polyT stretch; (3) a stem and loop structure ( Figure S2); (4) a TATA motif and a G(A)nT motif flanking the stem and loop structure; and (5) a G + A-rich sequence downstream of the stem and loop structure. However, they were not in the typical orders and positions, as reported in some insect species 34 .
Concerted evolution is common in the insect control region, most obviously in species with repeat units in their control regions such as termites 51 but also in species with non-tandemly repeated control regions such as thrips 52 . Repeat sequences were found in both species of Trichogramma, as have been reported in some other insects 15,40 . In T. japonicum, three 21-bp tandem repeats of "AGCCTCAAAAATCGGGGTTTT" and two 41-bp tandem repeats of "ATTATTATATAAATTATTTATATTTATATAAATATTTAATA" were found in the control region. In the three 21-bp tandem repeats, three mutations ("GCC" to "CTT") in the first repeat region were present. The control region of T. ostriniae contained nine 21-bp tandem repeats with several mutations among repeat units ( Figure S3). There was an 80-bp perfect repeat of TA in control region of T. ostriniae. The presence of repeat regions may inhibit DNA polymerase and could lead to the failure of sequencing in those regions 53,54 . Gene arrangement. In previously studied insect mitochondrial genomes, most rearranged genes were tRNA genes 16,25 . In the Hymenoptera, numerous rearrangements of protein-coding genes have been identified in several groups [13][14][15] . Compared with the putative ancestral pattern of the insect mitochondrial genome, dramatic gene rearrangements, not only in tRNA genes but also in protein-coding genes, were found in Trichogramma mitochondrial genomes. In total, 15 of 22 tRNA genes and seven of 13 protein-coding genes were rearranged in Trichogramma compared with the ancestral arrangement (Fig. 2). All genes in the mitochondrial genomes of the two Trichogramma species were encoded by the minority strand, rather than the majority strand, except for two protein-coding genes (cob and nad6) and four tRNA genes (trnE, trnK, trnT and trnS2).
Compared with the other sequenced mitochondrial genomes of Chalcidoidea, cox1 was inverted separately in Trichogramma, not together with the gene block of cox1-trnL2. The protein-coding gene nad2 did not change its relative position but changed direction compared with the ancestral type. The gene clusters between cox2-atp8, nad3-nad5, nad2-cox1 and control region-nad2 have been identified as the most frequently rearranged regions in mitochondrial genomes of Hymenoptera 13,19 , which also applied to Trichogramma. A novel tRNA gene cluster trnE-trnF-trnI-trnS1-trnN-trnC formed between cox1 and cox3. The tRNA cluster trnA-trnG formed between two ribosomal RNA genes; this is also novel in the Hymenoptera, in which the trnV gene is typically located between them 30 ; Although the conservation and inversion of large-scale gene blocks in Trichogramma was similar to other sequenced mitochondria genomes of Chalcidoidea, the rearrangement of protein-coding genes nad2 and cox1 as well as most tRNA genes are novel. The results showed that the species Ceratosolen solmsi from Agaonidae was not clustered with two other species of this family, even when the CAT model was used to avoid among-site rate heterogeneities (Fig. 3A). This species had a long branch compared to other species of ingroup, as shown in the original study of the mitochondrial genome of this species 14 . We predict that the inferred polyphyly of Agaonidae might be caused by long-branch attraction in C. solmsi. The Hymenoptera has been shown to be a group with both rapidly and slowly evolving mitochondrial genomes 32 . Long branches have been identified in Cephalonomia gallicola (Chrysidoidea: Bethylidae), Wallacidia oculata (Vespoidea: Mutillidae) 13,55 and Primeuchroeus spp. (Aculeata: Chrysididae) 55 . Identification of other species with long branches may help to improve phylogenetic inference of relationships within Hymenoptera.
By removing the outlier species C. solmsi from the analysis, a well-supported phylogenetic relationship among three families of Chalcidoidea was generated (Fig. 3B). The Agaonidae and Pteromalidae formed one lineage, which was then sister to Trichogrammatidae. This is congruent with the currently supported phylogeny of Chalcidoidea 56 .
We also used gene arrangement to reconstruct phylogenetic relationships among the three families. The inferred topology is identical to the one generated from gene sequences (Appendix S1). Our initial work indicates that gene rearrangements of the mitochondrial genome may provide valuable information for recovering phylogenetic relationships within Chalcidoidea. More representative mitochondrial genomes from different groups are needed to validate our assumption.

Rearrangement pathway of Trichogramma. We inferred the evolution of gene arrangement in
Trichogramma using CREx by comparing the common intervals between ChalcidoidGO and Trichogramma gene order (TrichogrammaGO) (Fig. 4). Four operations were considered in CREx, i.e., transpositions, inversions, inverse transpositions, and TDRL. The CREx identified eight operations in the evolution of gene arrangement in Trichogramma, including one transposition (operation 1 in Fig. 4, referring to rrnS), two inverse transpositions (operations 2 and 3 in Fig. 4, referring M and W, respectively), two inversions (operations 4-5 in Fig. 4) and three TDRLs (operations 6-8 in Fig. 4).
There are two sets of alternative scenarios in operations 4-8. The first set of scenarios refers to inversions of trnI and tRNA cluster from trnA to trnF, followed by three TDRLs (operations 4-8 in Fig. 4), while the other set refers to inversion of two large gene segments including both tRNA and protein-coding genes, followed by three TDRLs (operations 4′-8′ in Fig. 4). The presence of intergenic spacers located in the position involved in rearrangements and presence of remnant of the genes that changed positions within intergenetic spacers will provide evidence to choose the plausible rearrangement pathways 23,57 . However, we did not find obvious evidence to support one set of scenarios. Mapping the gene rearrangement patterns on the inferred phylogenetic tree may help to validate the scenarios using MLGO 58 or TreeRex 59 . However, the missing of genes in most species limited the usage of this approach. Rearrangement of tRNA genes are believed to be more frequent than that of large segment 3,16 . Thus, we presumed that scenarios 4-8 are more plausible than 4′-8′ in rearrangement of Trichogramma mitochondrial genomes. However, we could not exclude other pathways due to the computational complexity in gene order reconstruction and the algorithms implemented in CREx 27,57 . Raw sequencing data were generated by Illumina base calling software CASAVAv1.8.2 (http://support.illumina.com/sequencing/sequencing_software/casava.ilmn), and sequences containing adaptors or primers were identified by SeqPrep (https://github.com/jstjohn/SeqPrep). Sickle (https://github.com/najoshi/sickle) 60 was applied to conduct reads trimming with default parameters to obtain clean data for this study. In addition, SOAPdenovo (http://soap.genomics.org.cn/, v2.05) 61 was used to perform genome assembly with multiple Kmer parameters and assess the assembly results. GapCloser software 61 (downloaded from SOAPdenovo website) was subsequently applied to fill the remaining local inner gaps and correct single-base polymorphisms for the final assembly results.
Genome feature analysis. Base composition, codon usage, Relative Synonymous Codon Usage (RSCU) values and nucleotide substitution were analyzed using MEGA6 63 . GC and AT asymmetries were calculated according to the formulas used in a previous study 40 . AT-and GC-skews were measured for the majority strand using the formulas AT skew = (A − T)/(A + T) and GC skew = (G − C)/(G + C). The tandem repeats in the control region were predicted using the Tandem Repeats Finder available online (http://tandem.bu.edu/trf/trf.html) 64 .
Phylogenetic analysis. Phylogenetic relationships among three families of Chalcidoidea with sequenced mitochondrial genomes were reconstructed. A phylogenetic tree was reconstructed based on the nucleotide sequences and amino acid sequences of the 13 protein-coding genes. Nucleotide sequences were aligned by codon using MAFFT version 7.205 65 . Phylogenetic relationships were reconstructed with MrBayes version 3.2.5 66 , IQ-TREE web server 67 and PhyloBayes-MPI 68 . In MrBayes analyses, matrices were partitioned by codon position. Then, we used PartitionFinder version 1.1.1 69 to determine the best partition and substitution models. Four independent Markov chains were run for 10 million metropolis-coupled generations, with tree sampling every 1000 generations and a burn-in of 25%. In IQ-TREE analyses, the number of bootstrap replicates was set to 1000 with CAT model (C20 + 4 G) 70  Pelecinidae) 55 and Vanhornia eucnemidarumI (Proctotrupoidea: Vanhorniidae) 71 were chosen as outgroups according to previously inferred phylogenetic relationships among major lineages of Apocrita 13,55 .
We also inferred phylogenetic relationships among the three families of Chalcidoidea based on gene arrangement patterns. Phylogenetic relationships were inferred using a Maximum Likelihood (ML) method based on gene-order data implemented in the MLGO web server 58 . We excluded C. solmsi from the analysis due to missing genes and the inclusion of other representatives from the family Agaonidae.
Gene organization analysis. The evolutionary pathways of gene arrangement in Trichogramma were estimated by CREx (Common Interval Rearrangement Explorer) 27 . The heuristic method based on common interval was used to determine genome rearrangement scenarios between presumed ancestral gene order of Chalcidoidea and that of Trichogramma. Gene rearrangement pattern were mapped to the phylogenetic tree using MLGO web server 58 . The ChalcidoidGO was used as outgroup.
Data availability statement. The data were deposited into GenBank under accession numbers: KU577436 and KU577437.

Conclusion
The two mitochondrial genomes sequenced in our study from Trichogramma add to our knowledge of the mitochondrial genomes of Hymenoptera. Compared with those of other related insects, the mitochondrial genomes of Trichogramma were significantly rearranged, not only in tRNA genes but also in many protein-coding genes. Congruent tree topologies were recovered using gene sequences, including nucleotides and amino acids. The specific gene order in mitochondrial genomes of these species indicated that gene rearrangement information may represent potentially valuable data for phylogenetic analyses of Chalcidoidea. The availability of the complete mitochondrial genomes of Trichogramma provides information for development of genetic markers to study community ecology, population biology and evolution in this species complex.