Mitochondrial genomes of the hoverflies Episyrphus balteatus and Eupeodes corollae (Diptera: Syrphidae), with a phylogenetic analysis of Muscomorpha

The hoverflies Episyrphus balteatus and Eupeodes corollae (Diptera: Muscomorpha: Syrphidae) are important natural aphid predators. We obtained mitochondrial genome sequences from these two species using methods of PCR amplification and sequencing. The complete Episyrphus mitochondrial genome is 16,175 bp long while the incomplete one of Eupeodes is 15,326 bp long. All 37 typical mitochondrial genes are present in both species and arranged in ancestral positions and directions. The two mitochondrial genomes showed a biased A/T usage versus G/C. The cox1, cox2, cox3, cob and nad1 showed relatively low level of nucleotide diversity among protein-coding genes, while the trnM was the most conserved one without any nucleotide variation in stem regions within Muscomorpha. Phylogenetic relationships among the major lineages of Muscomorpha were reconstructed using a complete set of mitochondrial genes. Bayesian and maximum likelihood analyses generated congruent topologies. Our results supported the monophyly of five species within the Syrphidae (Syrphoidea). The Platypezoidea was sister to all other species of Muscomorpha in our phylogeny. Our study demonstrated the power of the complete mitochondrial gene set for phylogenetic analysis in Muscomorpha.


Results and Discussion
General features of the newly sequenced mitochondrial genomes. The complete Episyrphus mitochondrial genome sequence is 16,175 bp long (GenBank accession KU351241) ( Table 2). The partial mitochondrial genome of Eupeodes mitochondrial genome sequence is 15,326 bp long (GenBank accession KU379658) ( Table 3). Particularly A + T-rich region was failed to generate reliable sequence data in both species.
No gene rearrangement was observed in our analyses: (1) as compared with the putative ancestral insect arrangement 29 , (2) as in all sequenced dipteran species 11 , (3) as with the 23 genes encoded on the majority strand (J-strand), and (4) as with the 14 genes encoded on the minority strand (N-strand).
Each of the 37 typical mitochondrial genes is present in both species. The mitochondrial genome of Episyrphus has 255 bp of intergenic nucleotides, in 22 different locations, with intergenic spacer lengths ranging from 1 to 60 bp. Seven pairs of genes overlap each other, with overlap lengths ranging from 1 to 7 bp. Eight pairs of genes directly adjacent one another including the pairs of rrnL-trnV and trnV-rrnS. The mitochondrial genome of Eupeodes has 230 bp of intergenic nucleotides, in 19 locations, with intergenic spacer lengths from 2 to 47 bp. Nine pairs of genes overlap each other, with overlap lengths ranging from 1 to 7 bp. Nine pairs of genes directly adjacent one another including the pairs of rrnL-trnV and trnV-rrnS. In both species, the longest intergenic spacer was located between trnK and trnD, followed by the one located between trnE and trnF. The longest overlapping

Protein-coding genes, codon usage and nucleotide diversity. Nine of the 13 mitochondrial PCGs in
Episyrphus and Eupeodes mitochondrial genomes are located on the J-strand; the other four PCGs are located on the N-strand (Tables 1 and 2). Total PCG length in Episyrphus is 11,220 bp, while Eupeodes has 11,211 bp of PCG. All Episyrphus and Eupeodes mitochondrial genome PCGs start with ATN codons. One, six, and six of the PCGs start with ATA, ATG, and ATT, respectively. Orthologs from the two species have the same start codons. Most PCG stop codons are the canonical TAA, except for nad5 in Eupeodes, which uses an incomplete TA.
Mitochondrial genome codon usage in Episyrphus and Eupeodes show a significant bias towards A and T ( Fig. 1) as in other species of Muscomorpha ( Figure S1). In the Episyrphus and Eupeodes mitochondrial genomes, Leu, Ile, Phe, and Met are the most frequently encoded amino acids, hence TTA (Leu), ATT (Ile), TTT (Phe), and ATA (Met) are the most frequent codons, as is typical of other insect mitochondrial genomes 30,33,34 . These frequently used codons exclusively consist of A and T, which contribute to the high A + T content seen in most fly mitochondrial genomes ( Figure S1). This preferred codon usage is strongly reflected at third positions by high A/T versus G/C frequencies.
Evolutionary rate of protein-coding genes was calculated by using the nucleotide diversity and Jukes and Cantor corrected nucleotide diversity within Muscomorpha. Among the 13 protein-coding genes, five genes of cox1, cox2, cox3, cob and nad1 showed relatively low level, five genes of atp6, nad6, nad4, nad4l and nad5 showed  Table 3. Annotation of the Eupeodes corollae mitochondrial genome. Symbols are as in Table 2.
medial level, whereas three genes of atp8, nad2 and nad6 showed the highest level of nucleotide diversity (Fig. 2). Relative evolutionary rate among the 13 protein-coding genes in Muscomorpha was similar to previous studies of insect mitochondrial genomes 35,36 .   Secondary structure models of the tRNA genes in the two newly sequenced mitochondrial genomes were predicted using the Mitos WebServer 37 (Fig. 3). In Episyrphus and Eupeodes, all tRNA genes fold into the canonical clover-leaf structure. The dihydrouridine (DHU) arm of all the tRNAs is a large loop, instead of a conserved stem-and-loop structure; however, this is typical of metazoan mitochondrial genomes 38 . The amino acid acceptor (AA) stem and the anticodon (AC) loop are conserved at 7 bp in all of our tRNA genes. The size of the variableand D-loop often determine overall tRNA length 39 . The DHU arms in our tRNAs are 2 to 4 bp long, the AC arms are 4 to 5 bp long, and the TΨ C arms vary in length from 3 to 5 bp. The variable loops are less consistent, ranging from 4 to 8 bp.
We also compared the variation of stem regions of tRNA genes among 15 species of Muscomorpha. Among the 22 tRNA genes, trnM was the most conserved one without any nucleotide variation in stem regions, followed by trnV and trnE with three site mutations. The trnC showed the highest number of site mutation on stem regions (17 sites), followed by the trnH (16 sites) (Fig. 2).
Base pairs other than canonical A-Us and C-Gs are occasionally used in our tRNAs, based on predicted tRNA secondary structures. We found six and five mismatched base pairs in the tRNAs from Episyrphus and Eupeodes, respectively. Among the six mismatched base pairs in Episyrphus, five are U-U pairs, located in the AA and TΨ C stems; the other is an A-A pair, located in the A-A stem. Eupeodes has four U-U pairs, located in the AA and TΨ C stems, and an A-A pair, located in the A-A stem.
The two ribosomal RNA genes in the mitochondrial genome, rrnL and rrnS, are 1,338 bp long, with an A + T content of 84.61%; and 804 bp long, with an A + T content of 83.96%, respectively, in Episyrphus. In Eupeodes, rrnL is 1334 bp long, with an A + T content of 84.78%; rrnS is 795 bp long, with an A + T content of 83.14% (Table 4).
Phylogenetic relationships. We reconstructed phylogenies within the Muscomorpha using the nucleotide sequences of the 37 mitochondrial genes. Bayesian and maximum likelihood (ML) methods estimated congruent topologies (Fig. 4). Our analyses supported the monophyly of all superfamilies used in the study. The Aschiza (lower Cyclorrhapha) was found to be a paraphilic group 40 . We included two superfamilies of Platypezoidea and Syrphoidea from Aschiza. Platypezoidea was sister to all other species of Muscomorpha, which is congruent with previous studies 11,41 . The five genera of Syrphidae (Syrphoidea) clustered as ((unknown Syrphidae sp.) + (Ocyptamus + (Eupeodes + (Episyrphus + Simosyrphus)))). Syrphoidea and Opomyzoidea formed a lineage, and then sister to the other species of Schizophora. The Opomyzoidea was traditionally considered as a superfamily of Schizophora, which was proved to be a monophyletic group 40 . We study showed that Schizophora was interrupted by Opomyzoidea, which might be caused by the long-branch of Opomyzoidea. In Opomyzoidea, we used one species from family Fergusoninidae, in which, all species are gall-forming flies together with Fergusobia (Tylenchida: Neotylenchidae) nematodes. The novel life history of the species from Fergusoninidae might affect the evolutionary pattern of their mitochondrial genomes 2 . The long-branch of Opomyzoidea was also found in previous study based on mitochondrial genome sequences 41 . Phylogenetic relationships of other groups included in our analyses, i.e. ((Sciomyzoidea + Tephritoidea) + (Ephydroidea + (Muscoidea + Oestroidea))), were in accord with previous studies 9, [13][14][15][16] .

Methods
Sampling and DNA extraction. The specimens were collected from Sichuan Province, China. Specimens were initially preserved in 100% ethanol in the field when collected, and then stored at − 80 °C prior to DNA extraction. Whole genomic DNA was extracted from the legs and thorax of the specimens using a DNeasy tissue kit (Qiagen, Hilden, Germany), following the manufacturer's protocols.  Table 1.
Scientific RepoRts | 7:44300 | DOI: 10.1038/srep44300 Scientific RepoRts | 7:44300 | DOI: 10.1038/srep44300 PCR amplification and sequencing. Initially we used a previously designed set of universal primers for insect mitochondrial genomes 1,42 to amplify and sequence partial gene segments. Then we designed specific primers based on the sequenced segments to amplify regions that bridged the gaps between different segments (Table S1). PCR cycling consisted of an initial denaturation step at 96 °C for 3 min, followed by 40 cycles of denaturation at 95 °C for 30 s, annealing at 42-53 °C for 30 s, elongation at 60 °C for 1.5 kb/min (depending on the size of target amplicon), and a final elongation step at 60 °C for 10 min. PCR products were evaluated by agarose gel electrophoreses. PCR components were added following the Takara LA Taq protocols. A primer-walking strategy was used for all the amplifications from both strands (Table S1).

Mitochondrial genome annotation.
Mitochondrion DNA sequences were assembled using Lasergene software (DNAStar, Inc., USA, NewYork). The tRNA genes were initially identified using the Mitos WebServer (http://mitos.bioinf.uni-leipzig.de/index.py) 37 . We set the genetic code to "Invertebrate Mito". Those tRNAs that could not be found using this approach were confirmed by sequence alignment with their homologs from related species. Secondly, protein-coding genes were identified by BLAST searches in GenBank, using other published mitochondrial genomes from Syrphidae 9,16,28 . Finally, the rRNA genes and control regions were identified by the boundary of the tRNA genes, and by comparison with other insect mitochondrial genomes.
Comparative analysis of the mitochondrial genomes from Symphyta. We compared the mitochondrial genomes of 16 species from the Muscomorpha, including our two newly sequenced genomes. Gene arrangement, base composition, and PCG codon usage features were analyzed. Because several tRNA genes were not available for some species, we analyzed base composition using only the PCGs. Furthermore, the unknown Syrphidae sp. sequence lacked nad2 data; therefore, we excluded this species from these analyses.
We calculated base composition using MEGA6 43 . The AT-skew and GC-skew were calculated according to Hassanin, et al. 31 : AT-skew = (A% − T%)/(A% + T%) and GC-skew = (G% − C%)/(G% + C%). The intergenic spacers and overlapping regions between genes were counted manually. The relative synonymous codon usage (RSCU) of all protein-coding genes was calculated using CodonW (written by John Peden, University of Nottingham, UK). Nucleotide diversity and Jukes and Cantor-corrected nucleotide diversity were calculated for species of Muscomorpha using DnaSP v5 44 .
Phylogenetic analysis. We used 14 Muscomorpha species with published mitochondrial genomes, and our two newly sequenced mitochondrial genomes for phylogenetic analyses ( Table 1). The 16 species are classified as belonging to two sections, Aschiza and Schizophora. We selected Aschiza sequences belonging to two superfamilies, Platypezoidea and Syrphoidea. Schizophora is classified as two subsections. We selected sequences from both subsections, and selected sequences from six superfamilies within them. Cydistomyia duplonotata and Trichophthalma punctata (Tabanomorpha: Tabanoidea: Tabanidae) were used as outgroups because of the close relationship between Tabanomorpha and Muscomorpha 11 . MAFFT version 7.205, which implements consistency-based algorithms, was used for the alignment of protein-coding and RNA genes 45 . We used the G-INS-I and Q-INS-I algorithms in MAFFT 46 for protein-coding and RNA alignment, respectively. The alignment of the nucleotide sequences was guided by the amino acid sequence alignment using the Perl script TranslatorX version 1.1 47 .
Data partitioning, and the ability to apply specific models to different partitions, is ideal for analyzing mitochondrial genomes 2 . We used PartitionFinder version 1.1.1 48 to simultaneously confirm partition schemes and choose substitution models for the matrix. The DNA sequence search model was set to "mrbayes". The greedy algorithm was used, with estimated, linked branch lengths, to search for the best-fit partitioning model.
We constructed phylogenies among the Muscomorpha with the Bayesian inference method (BI) using Mrbayes version 3.2.5 49 , and the ML method using RAxML version 8.0.0 50 . In BI, the GTR + I + G, GTR + G, HKY + I + G, and HKY + G models were used with corresponding partitions (Table S2). Four simultaneous Markov chains were run for 10 million generations, with tree sampling occurring every 1,000 generations, and a burn-in of 25% of the trees. We used the GTR + G model for each ML analysis. We conducted 200 ML runs to find the highest-likelihood tree, then analyzed 1,000 bootstrap replicates.