Characterization of the complete mitochondrial genomes of Maiestas dorsalis and Japananus hyalinus (Hemiptera: Cicadellidae) and comparison with other Membracoidea

Only six mitochondrial genomes (mitogenomes) have been previously published for Cicadellidae, the largest family of Hemiptera. This study provides complete, annotated mitogenomes of two additional cicadellid, species Maiestas dorsalis and Japananus hyalinus, and the first comparative mitogenome analysis across the superfamily Membracoidea. The mitogenomes of both sequenced species are similar to those of other studied hemipteran mitogenomes in organization and the lengths are 15,352 and 15,364 bp with an A + T content of 78.7% and 76.6%, respectively. In M. dorsalis, all sequenced genes are arranged in the putative ancestral insect gene arrangement, while the tRNA cluster trnW-trnC-trnY is rearranged to trnY-trnW-trnC in J. hyalinus, the first reported gene rearrangement in Membracoidea. Phylogenetic analyses of the 11 available membracoid mitogenomes and outgroups representing the other two cicadomorphan superfamilies supported the monophyly of Membracoidea, and indicated that treehoppers are a derived lineage of leafhoppers. ML and BI analyses yielded topologies that were congruent except for relationships among included representatives of subfamily Deltocephalinae. Exclusion of third codon positions of PCGs improved some node support values in ML analyses.

Insect mitochondrial genomes (mitogenomes) are typically double-stranded circular molecules with 37 genes, including 13 protein-coding genes (PCGs), two ribosomal RNAs (rRNAs), and 22 transfer RNAs (tRNAs) 1 . Complete mitochondrial genome sequences are not only more phylogenetically informative than shorter sequences of individual genes, but also provide sets of genome-level characters, such as the relative positions of different genes, RNA secondary structures and modes of control of replication and transcription. Because of the abundance of mitochondria in cells, maternal inheritance, absence of introns, and high evolutionary rates, insect mitochondrial genome sequences are the most extensively used genomic marker(s) in insects and are becoming increasingly important for studies of insect molecular evolution, phylogeny and phylogeography [2][3][4] . Following the development of next generation sequencing technology, large numbers of mitochondrial genomes are becoming available 2,5 . Mitochondrial genomes representing each of the major subordinal lineages from each of the 28 recognized insect orders are now available and representation at the family level is steadily improving.
The hemipterous superfamily Membracoidea (leafhoppers and treehoppers) is of interest because it is the most diverse and successful lineage of sap-sucking phytophagous insects, and because of the great variety of behavioral and life-history strategies found within this group.
The phylogenetic relationships of the leafhopper family Cicadellidae to the treehopper families Membracidae, Melizoderidae, and Aetalionidae, all comprising superfamily Membracoidea, have been controversial for decades Genome annotation and bioinformatic analyses. The two mitochondrial genomes were annotated with GENEIOUS R8 (Biomatters Ltd., Auckland, New Zealand). All 13 protein-coding genes and 2 rRNA genes were determined by comparison with the homologous sequences of other leafhoppers from GenBank. The 22 tRNA genes were identified using both of the tRNAScan-SE server v 1.21 25 and MITOS WebSever 26 , the secondary structure was also predicted by the MITOS WebSever.
The base composition and relative synonymous codon usage (RSCU) values of each protein coding gene (PCG) were calculated with MEGA 6.06 27 . Strand asymmetry was calculated using the formulas AT skew = [A−T]/ [A + T] and GC skew = [G−C]/[G + C] 28 . The number of nonsynonymous substitutions per nonsynonymous site (Ka) for the two species were calculated with DnaSP 5.0 29 , using Magicicada tredecim (Riley) from Cicadoidea and Callitettix braconoides (Walker) 30 from Cercopoidea as references. The tandem repeats of the A + T-rich region were identified by the tandem repeats finder online server (http://tandem.bu.edu/trf/trf.html) 31 .

Sequence alignment and substitution saturation test.
Sequences of all 13 PCGs and two rRNA genes were used in our analyses. After excluding stop codons, each PCG was aligned individually with codon-based multiple alignments using the MAFFT algorithm in the TranslatorX online server (http://translatorx.co.uk/) 32 , with gaps and ambiguous sites removed from the protein alignment before back-translating to nucleotides using GBlocks under default settings. The sequences of two rRNA genes were aligned separately using the MUSCLE algorithm implemented in MEGA 6.06. We found that MUSCLE and MAFFT have been shown to perform similarly on our rDNA sequence data but because MUSCLE is implemented in MEGA 6.06, which facilitates manual removal of poorly aligned positions, we used MUSCLE for these genes.
Saturation tests for different codon positions of PCGs and the two rRNA genes were performed, with the uncorrected p-distances plotted against the GTR distances. All distances were generated using PAUP⁄4.0 b10 33 . The slope, correlation coefficient and average GTR distance were used as measures of substitution saturation; the lower the slope, the greater the level of saturation [34][35][36] .
Optimal nucleotide substitution models and partition strategies were chosen by PartitionFinder v1.1.1 38 . Under the "greedy" search algorithm, we chose "unlinked" to estimate branch lengths and used the Bayesian information criterion (BIC) as the metric for the partitioning scheme. Details of the best-fit schemes for ML and BI analysis are shown in Supplementary Table S3.
Phylogenetic inference. ML analyses were conducted using raxmlGUI 1.5 39 under the GTRGAMMAI model, and the node reliability was assessed by performing 1000 rapid bootstrap replicates (BS). Bayesian analysis was performed using MrBayes 3.2.6 40 . Two simultaneous runs with eight independent chains were run for five million generations and trees were sampled every 1000 generations. After the average standard deviation of split frequencies fell below 0.01, the first 25% of the total samples were discarded as burn-in and the remaining trees were used to generate a consensus tree and calculate the posterior probabilities (PP).

Results and Discussion
General features of the two newly sequenced mitochondrial genomes. The complete mitochondrial genomes of M. dorsalis (GenBank: KX786285) and J. hyalinus (GenBank: KY129954) are double-stranded circular molecules of length of 15352 bp and 15364 bp, respectively ( Fig. 1). Both sizes were comparable to other sequenced Membracoidea mitochondrial genomes ranging from 14805 bp in Nephotettix cincticeps to 16007 bp in Leptobelus gazella (Table 2). Each mitochondrial genome included the 37 typical mitochondrial genes (13 PCGs, 22 tRNAs and two rRNAs) and a control region (A + T-rich region) (Supplementary Tables S1, S2).
Base composition. All the newly obtained mitochondrial genomes exhibited heavy AT nucleotide bias, with A + T% of the whole sequences 78.8% in M. dorsalis and 76.6% in J. hyalinus, similar to those found in other sequenced Membracoidea (75.6-78.8%)( Table 2). CR sequences in most species showed the strongest A + T% biases, except for Entylia carinata, Drabescoides nuchalis and J. hyalinus. All species had higher A + T% in rrnL  Fig. S1).
Gene rearrangement in Membracoidea. Among the 11 sequenced mitochondrial genomes, the gene order of most species was highly conserved and identical to the putative ancestral insect (Drosophila yakuba) mitochondrial genome arrangement 1 . The only exception was J. hyalinus, the first known member of Membracoidea with a tRNA gene rearrangement, in which the tRNA cluster trnW-trnC-trnY was rearranged to trnY-trnW-trnC (Fig. 2). Although such gene rearrangements were previously unknown in the superfamily Membracoidea, they have often been found in other insect groups 41,42 .
Protein-coding genes, codon usage and substitution rates.   hyalinus, all PCGs terminated with TAA except for cox2, which ended with an incomplete T codon. Overall, in Membracoidea species, more TAA than TAG were used, and at least one incomplete T codon was present. For the relative synonymous codon usage (RSCU) of M. dorsalis and J. hyalinus, the four most frequently utilized amino acids were Leucine (Leu), Isoleucine (Ile), Methionine (Met) and Serine (Ser). Among the 62 available codons (excluding TAA and TAG), Leu (CUG), Ser (AGG, UCG) and Pro (CCG) were missing in M. dorsalis (Fig. 3).
The nonsynonymous substitution rate (Ka) for each taxon was measured in comparison with Callitettix braconoides and Magicicada tredecim (Fig. 4). Ka was relatively higher in treehoppers than leafhoppers when compared with M. tredecim, and the treehopper D. hardwickii (Aetalionidae) always showed the highest values, but Ka values were similar between these species (0.331-0.398 and 0.374-0.433) (Fig. 4).
Transfer RNA and ribosomal RNA genes. All    Similar to other insect mitochondrial genomes, two rrn genes were encoded on the N-strand, and located between trnL1 and trnV, and between trnV and the A + T-rich region, respectively. The lengths of the rrnL and rrnS in M. dorsalis were 1217 bp and 745 bp, with the A + T content 81.4% and 79.7%, respectively, while in J.  hyalinus are 1208 bp and 752 bp, with the A + T content 79.8% and 79.5%, respectively, which were consistent with the data available for other Membracoidea species (Table 2).
Gene overlaps and non-coding regions. The whole M. dorsalis mitochondrial genome had a total of 35 bp in overlaps between 11 gene junctions, while J. hyalinus had 27 bp overlaps between eight gene junctions. The longest overlap (8 bp) occurs between trnW and trnC in both two species. The two common pairs of gene overlaps: atp8-atp6 (7 bp) and nad4-nad4l (7 bp), also were found in two other species (Supplementary Tables S1,  S2).
There were 13 intergenic spacers totaling 92 bp non-coding bases in the M. dorsalis mitochondrial genome, ranging in size from 1 bp to 16 bp, and the longest two intergenic spacers (16 bp) were between nad2-trnW and cox1-trnL2 respectively. For J. hyalinus, 15 intergenic spacers occupied 178 bp non-coding bases, the longest being 73 bp between trnY and trnW, caused by the tRNA gene rearrangement (Supplementary Tables S1, S2).
The putative control region, located between rrnS and trnI, was the longest intergenic spacer in the mitochondrial genome. The lengths of this region in M. dorsalis and J. hyalinus were 908 bp and 848 bp respectively, well within the range of other sequenced Membracoidea (399 bp in N. cincticeps to 1750 bp in L. gazella) ( Table 1).
The control region sequence of M. dorsalis included one large tandem repeat, two 239 bp tandem repeat units and a partial third (179 bp) were beginning with the first nucleotide of this region (Fig. 5). J. hyalinus also had two different 159 bp tandem repeat units, consisting of the main portion of the control region. Tandem repeats identified in the control region of other sequenced Membracoidea mitochondrial genomes include four in H. vitripennis, three in D. nuchalis, and two different types of tandem repeats in L. gazella, E. carinata and Tambocerus sp. Characteristics of this region in Membracoidea were taxon-specific, and the different size or copy numbers of repeat units had some influence on the size of the region (Fig. 5).
Phylogenetic relationships. We performed 10 independent phylogenetic analyses to evaluate the influence of different datasets and inference methods on tree topology and nodal support. These analyses yielded two different tree topologies, with incongruence restricted to relationships among members of cicadellid subfamily Deltocephalinae (Figs 6,7).  Monophyly at the superfamily level within Membracoidea was strongly supported in both trees (Figs 6,7). Derivation of Membracidae and Aetalionidae from within a paraphyletic Cicadellidae was well supported by all results, as suggested by previous analyses 9,13 . Although other recent analyses based on mitogenomic data supported treehoppers as a sister group to Cicadellidae [19][20][21] , none of these studies included all full available mitochondrial genomes within Membracoidea and, thus, the incongruence may be due to sample bias. For the three treehoppers, the two species of Membracidae E. carinata and L. gazella formed a paraphyletic grade giving rise to D. hardwickii (Aetalionidae), indicating paraphyly of Membracidae (Figs 6,7). This result differs from previous studies, in which Membracidae was usually placed as the sister clade to Aetalionidae 13,43,44 .
Within Deltocephalinae, the five species (M. dorsalis, J. hyalinus, D. nuchalis, N. cincticeps and Tambocerus sp.) representing five tribes (Deltocephalini, Opsiini, Drabescini, Chiasmini and Athysanini, respectively) formed a monophyletic group with high support. For all datasets in BI analyses and the AA dataset in ML analyses, the topology (Athysanini + ((Opsiini + Deltocephalini) + (Drabescini + Chiasmini))) was recovered, while except for the AA dataset, all other datasets yield the topology (Deltocephalini + (Opsiini + (Athysanini + (Drabescini + Chiasmini)))) (Figs 6, 7). Thus, relationships among the included Deltocephalinae were inconsistently resolved in the different analyses, with support for some branches relatively low (BS < 50, PP < 0.9). Thus, as in a previous phylogenetic study based on combined morphological, 28S and Histone H3 data 14 our analyses were unable to resolve some relationships with confidence. Apparent topological discordances between our results and those of Zahniser and Dietrich 14 may due to the small taxon sample of the current study. Classification of Deltocephalinae has been unstable, with D. nuchalis formerly placed in a separate subfamily, Selenocephalinae 45,46 , but more recently placed into Deltocephalinae based on morphological characters 47,48 . Recent phylogenetic studies indicate that the subfamily as defined by Oman et al. 49 was not monophyletic and that several other leafhopper subfamilies defined by Oman et al. 49 had their closest relatives within the Deltocephaline lineage 13,14,50 .
Previously studies confirmed that inclusion or exclusion of third codon positions had a strong influence on phylogenetic reconstruction 51,52 . Our saturation tests on different codon positions of PCGs and two rRNA genes (Fig. 8) showed that third codon positions are more saturated than first and second codon positions, with slopes of 0.2135, 0.5936 and 0.7599, respectively. Nevertheless, in our phylogenetic results, tree topologies were consistent regardless of whether third codon positions were excluded, but excluding third positions slightly increased support for some nodes in ML analyses (BS values: 75 to 84, 84 to 91, 80 to 86 and 87 to 91) (Fig. 8).

Conclusions
The first comparative analysis of mitochondrial genomes across the superfamily Membracoidea revealed a high level of conservatism in gene order, with all sequenced genes arranged in the putative insect ancestral gene arrangement, but one of the taxa newly sequenced for this study, J. hyalinus, had the putative ancestral tRNA cluster trnW-trnC-trnY rearranged to trnY-trnW-trnC. This is the first reported gene rearrangement in a mitogenome of Membracoidea. Phylogenomic analysis of the concatenated nucleotide sequences of 13 protein-coding genes (PCGs) and two ribosomal RNA genes from all available membracoid mitochondrial genomes supported the monophyly of Membracoidea and paraphyly of Cicadellidae with respect to the treehopper lineage (Aetalionidae + Membracidae). ML and BI analyses yielded topologies that were congruent except for relationships among included representatives of subfamily Deltocephalinae. Exclusion of third codon positions of PCGs improved some node support values in ML analyses. These results suggest that mitochondrial genome sequences are informative of higher level phylogenetic relationships within Membracoidea but may not be sufficient to resolve relationships within some membracoid lineages.