The mitochondrial genome of the wolfberry fruit fly, Neoceratitis asiatica (Becker) (Diptera: Tephritidae) and the phylogeny of Neoceratitis Hendel genus

Neoceratitis asiatica (Becker) (Diptera: Tephritidae) is one of the most important fruit pestsof wolfberry which is a traditional Chinese medicinal herb. We characterized the complete mitochondrial genome of N. asiatica and described its organization in this study. This mitogenome had a total length of 15,481 bp, consisting of 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes and a non-coding region (A + T-rich control region). The overall base composition of N. asiatica in descending order was 40.6% A, 8.5% G, 38.4% T and 12.6% C. The phylogenetic relationships shows that Ceratitis capitata and N. asiatica may be sister taxa. This is the first report of the complete mitochondrial genome of a member of the Neoceratitis Genus and the complete mitochondrial genome sequence may provide useful information for phylogenetic analysis and studies between the genera Ceratitis and Neoceratitis.


Results
Mitochondrial genome sequencing and assembly. An Illumina library of N. asiatica was sequenced on a run of Hiseq 2500. After excluding the low quality value reads (lower than Q20), 466,428 read-pairs were generated finally. Through "map to reference" strategy to map all cleaned NGS reads to part of cox1 gene by Geneious R10.0., 58,875 reads were assembled to get the target sequence. After generating all assembled reads, a consensus sequence length 16,074 bp was generated. Then we manually examined for repeats at the beginning and end of the sequence to form a circle to gain the complete mitochondrial genome sequence of N. asiatica which was 15,481 bp.
Spacing sequences in 19 regions ranged from 2 to 54 bp, the longest located between tRNA Cys and tRNA Tyr . The overlapping sequences ranged from 1 to 32 bp in 10 regions, the longest was between tRNA Leu(CUN) and lrRNA.
Contrary to other insect mitogenomes 11 , the nucleotide composition of N. asiatica was negative AT skews in the control region, while the rest was all AT biased and positive AT skews and negative GC skews in the whole mitochondrial genome, PCGs, rRNAs, tRNAs and the control region ( Table 2). The A + T content of the non-coding control region was 88.2%.
The size of 22 tRNAs ranged from 64 bp (tRNA Arg and tRNA Thr ) to 72 bp (tRNA Pro ). Most tRNAs could be folded into the cloverleaf structure except for tRNA Ser(AGN) , which lacked the D-loop (Fig. 2). The number of base pairs in the DHU-stem ranged from 3 to 4 (Fig. 2). Most of the TΨ C-stems had 5 base pairs while 7 tRNAs (tRNA Ile , tRNA Lys , tRNA Arg , tRNA Ser(AGN) , tRNA Thr , tRNA Cys , tR NA His ) had 4 bp in the TΨ C-stems. The number of bases in the D-loop and TΨ C-loop was variable.
The two genes encoding the small and the large ribosomal subunits were located between tRNA Leu(CUN) and tRNA Val , and between tRNA Val and the control region. The lrRNA was 1,359 bp long with an A + T content of 82.6%, and the srRNA was 790 bp long with an A + T content of 79.5%.
The control region (397 bp) was flanked by srRNA and tRNA Ile and was highly enriched in AT (88.2%).   Table 2. Nucleotide composition of the mitochondrial genome of Neoceratitis asiatica (Becker).
with 9,117 nucleotides. 6) PCG12 + rRNA + tRNA: 13 protein-coding genes, 2 rRNA genes and 22 tRNA genes with 10,473 nucleotides. Based on the datasets, the topology structures conducted from Bayesian and ML analyses were very similar (Fig. 3). From our results, the genera Ceratitis and Neoceratitis are sister groups in the trees with high posterior probabilities (1.0) and ML bootstraps (100).

Discussion
In this study, we are reporting the first complete mitochondrial genome of Neoceratitis species -N. asiatica (Becker) in Tephritidae. The mitochondrial genome of N. asiatica is a closed circular molecule of 15,481 bp, which is the shortest one among the other 22 tephritid mitogenomes available with the size ranging from 15,687 bp in B. tau to 16,253 bp in D. longicornis. The control region of N. asiatica mitogenome is 397 bp in length, which is also the shortest one in the other published tephritid mitogenomes with the size ranging from 801 bp in B. tau to 1,343 bp in D. longicornis (Supplementary Table S2). The A + T contents of the whole mitogenome, PCGs, tRNAs, rRNAs and CR in N. asiatica are 79.0%, 77.8%, 77.9%, 81.5% and 88.2%, well in the range of amongst all reported tephritid mitogenomes, which range from 67.28% (B. minax) to 80.83% (P. utilis) in the whole mitogenome, from 64.30% (B. minax) to 78.90% (P. utilis) in PCGs, from 72.31% (B. minax) to 80.61% (P. utilis) in tRNAs, from 73.71% (B. minax) to 85.69% (P. utilis) in rRNAs and from 77.65% (B. minax) to 91.14% (C. capitata) in CR (Supplementary Table S2).
The AT skews and GC skews of N. asiatica in the whole mitogenome, PCGs, tRNAs, rRNAs and CR are0.  Table S2).
Seven PCGs in all Tephritidae species have the same start codons (ATG in ATP6, COII, CYTB, ND4 and ND4L, ATT in ND2, TCG in COI), and five PCGs (ATP6, ATP8, COIII, ND4L and ND6) have the same stop TAA codons (Table 3). In ND5, the TAT stop codon of N. asiatica is different from all the other Tephritidae species with TAA or T stop codon.
Phylogenetic relationship of Tephritid fruit flies based on molecular data has been reported by several researchers and there exist some arguments for a long period.
The relationship between subgenus Zegodacus and other subgenus of Bactrocera is questionable. White suggested that subgenera Zeugodacus should split from Bactrocera to combine with Dacus genus to form a new genus-Zeugodacus from morphological evidence 8 ) ATG TAA ATT  TAA TCG TAA ATG TAA ATG TAA ATG TAG ATA T   A. fraterculus  ATG TAA ATT  TAA TCG TAA ATG TAA ATG TAA ATG TAG ACA TAA   B. arecae  ATG TAA GTG  TAA TCG TA  ATG TAA ATG TAA ATG TAA ATA T   B. carambolae  ATG TAA GTG  TAA TCG TA  ATG TAA ATG TAA ATG T  ATA T   B. correcta  ATG TAA GTG  TAA TCG TA  ATG TAA ATG TAA ATG TAG 11 . In this study, we confirmed that subgenera Zeugodacus are closer to genus Dacus but distinct from other subgenera (Bactrocera, Daculus and Tetradacus) of Bactrocera genus from mitochondrial genome data level. Han and Ro reconstructed the phylogeny of the family Tephritidae by mitochondrial 12 S, 16 S, and COII gene fragments using 79 tephritid species. Phylogenetic trees suggested that Dacini and Ceratitidini are sister group which both of them belong to Dacinae and have distance to Anastrepha which belong to Toxotrypanini 12 . While Krosch et al. found Anastrepha ludens which belongs to Trypetinae subfamily was closer to Dacini (Dacinae subfamily) than to C. capitata based on 16S rRNA, COI, COII and white eye genes 10 . Fernández et al. constructed the phylogenetic tree using the neighbour-joining method based on COII gene representing six genera (Ceratitis, Rhagoletis, Dacus, Bactrocera, Anastrepha and Toxotrypan) of the family. The result also showed that Anastrepha and Bactrocera cluster in one branch while Ceratitis formed another branch individually 13 . Nakahara and Murajiuse used a 1.3 kb portion of mitochondrial DNA containing the tRNA Leu and flanking COI and COII regions for phylogenetic analyses. The result also shows that Dacini seems more closely related to Anastrepha than to the Ceratitidini 14 . Our research also drew the same conclusion that Anastrepha fraterculus is closer to Dacini rather than to C. capitata using the published mitochondrial genome data (5 of 6 datasets posterior probabilities are 1.00 and ML bootstraps are 100 for Bayesian and ML analyses separately) which implicates that we should reconsider the phylogenetic relationships between Dacinae and Trypetinae according to the molecular evidence.
There is also an argument about the phylogenetic status of the genus Neoceratitis, most of which are sequenced by four mitochondrial and one nuclear gene fragment (COI, 16 S, tRNA Pro , ND6, period). Barr 11 . So far, various studies, all of which expounding with the sample Neoceratitis cyanescens, have shown the close relationship between the two genera, Ceratitis and Neoceratitis 9 . Based on the previous studies mentioned above, the phylogenetic position between the genera Ceratitis and Neoceratitis was not well resolved. Thus we expected that the complete mitochondrial genome sequence of N. asiatica could make some contributions towards the phylogeny reconstruction of subtribe Ceratitidina.
In this study, the Bayesian and ML reconstructions place the two genera Ceratitis (C. capitata) and Neoceratitis (N. asiatica) together, which means they may be sister taxa. Limited to the data of complete mitochondrial genome in different Tephritidae species, exploring the relationship between the two genera Ceratitis and Neoceratitis still needs more researches.

Materials and Methods
Sample collection and DNA extraction. The N. asiatica samples were collected in Ningxia province, China and preserved in 100% ethanol. They were identified based on morphological characteristics. Genomic DNA was extracted from individual N. asiatica adult using the DNeasy DNA Extraction kit (QIAGEN).
Mitogenome sequencing and annotation. Genomic DNA library preparation and sequencing were carried out by Berry Genomics sequencing company (Beijing, China). Genomic DNA was fragmented with Bioruptor to an average insert size of 250 bp and sequenced on Illumina Hiseq 2500. Part of cox1 gene was sequenced as the "anchor" to reconstruct the mitochondrial genome of N. asiatica using a general insect primer pairLCO1490/ HCO2198 16 . We picked up the mitochondrial genome sequence with "map to reference" strategy and mapped all cleaned NGS reads to the "anchor" by Geneious R10.0 17 . The parameters we set for assembly were: 1) minimum overlap identity 95%, 2) minimum overlap 50 bp, 3) maximum 5% gaps per read, and 4) maximum gap size 20 bp.
Thirteen protein-coding genes and two rRNA genes were identified by BLAST searches in NCBI (http://www. ncbi.nlm.nih.gov/) and then confirmed by alignment with homologous genes from other 22 Tephritid species available in GenBank. The tRNA genes were identified using the tRNAscan-SE 18 and MITOS WebServer 19 . The circular map of N. asiatica complete mitochondrial genome was generated and annotated using Geneious. The start/stop codon usages were analysed by DNAMAN 8.0. The composition of skew was calculated manually based on the formula: AT skew = (A − T)/(A + T) and GC skew = (G − C)/(G + C) 20 . The sequin file was edited and submitted to NCBI (NCBI GenBank accession number MF434829).
MrBayes v.3.2.5 21 and a PHYML 22 online web server were used to analyze the six datasets under GTR + I + G model. The model was selected using Jmodeltest 2.1.7 23 . In Bayesian analysis, two simultaneous runs of 1,000,000 generations were conducted for the matrix. Each one was sampled every 200 generations with a burn-in of 25%. Trees inferred prior to stationarity were discarded as burn-in, and the remaining were used to construct a 50% majority rule consensus tree. The ML analysis was conducted with 1,000 bootstraps. Phylogenetic trees were viewed and edited by FigTree v.1.4.3 24 . Sequences were aligned using ClustalW with the default parameters implemented in MEGA 5.0 25 . The ambiguous positions in the genes alignment were filtered with Gblocks v0.91b 26 . The aligned sequences of each gene were concatenated using SequenceMatrix v1.7 27 .