Mitochondrial genomic variation and phylogenetic relationships of three groups in the genus Scaphoideus (Hemiptera: Cicadellidae: Deltocephalinae)

The widespread leafhopper genus Scaphoideus Uhler is the most diverse genus in Scaphoideini and includes some species that are serious pests and vectors of plant pathogens. Here the first Scaphoideus mitogenome sequences are provided for three species, S. maai, S. nigrivalveus and S. varius, representing three main species groups in the Oriental region based on color pattern. The lengths of these three mitogenomes were 15,188, 15,235 and 15,207 bp, respectively. Gene order of three mitogenomes is highly conserved and identical to that of the putative ancestral insect. All three mitogenomes exhibited similar AT nucleotide bias, AT-, GC-skews and codon usage. One large 101 bp intergenic spacer between trnY and cox1 was in S. varius. All 22 tRNA genes had typical cloverleaf secondary structures, except for trnS1 (AGN) which appears to lack the dihydrouridine arm. Genes atp8, nad6 and nad2 were highly variable while cox1 showed the lowest nucleotide diversity. Phylogenetic analyses of three concatenated nucleotide datasets using maximum likelihood and Bayesian methods, comprising all 13 mitogenomes currently available for Membracoidea plus mitogenomes for eight outgroup species representing other cicadomorphan superfamilies, yielded the same topology in which Scaphoideus species formed a monophyletic group within a larger clade comprising three other included Deltocephalinae.

male genitalia 9 . However, these two classifications applied only to North American species and have not been widely adopted. Recently, the Oriental species of the genus were also divided into three groups according to coloration 10 . Until now, phylogenetic studies, utilizing morphological and molecular data, have focused more broadly on Deltocephalinae and included few representatives of Scaphoideini 7,11,12 . The small amounts of DNA sequence data currently available for this tribe in GenBank mainly include DNA barcodes of partial cox1 sequences. No species of Scaphoideini has had its mitochondrial genome sequenced so far. To further elucidate the phylogenetic status and relationships of Scaphoideini, much more data are needed.
Here, we analyze the first three complete mitogenomes for Scaphodeini, based on three Oriental species of the genus Scaphoideus representative of the three main color forms found among the Asian fauna: S. maai, a species representing the common group with a median longitudinal yellowish or whitish stripe; S. nigrivalveus, representing the group with transverse bands on the head, pronotum and scutellum; and S. varius, representing the group with dark brown spots or bands on vertex and longitudinal bands on pronotum and scutellum. General genome features including base composition and codon usage of PCGs were compared to explore the sequence variability among these three groups, and tRNA secondary structures were predicated. To examine the phylogenetic utility of complete mtDNA sequences, we reconstructed the phylogenetic relationships among the three newly sequenced species and other leafhoppers for which mtDNA genome data are available, using the concatenated nucleotide sequences of 13 protein-coding genes (PCGs) and two ribosomal RNA genes.
Protein-coding genes and codon usage. Of the 13 PCGs, nine were located on the majority strand (J-strand) while the other four PCGs were encoded by the minority strand (N-strand). The third codon position had a significantly higher (P = 0.000) A + T content than that of the first and second positions (87.5% versus 71.2% and 67.9%) ( Supplementary Fig. S1). Pairwise comparisons among the three Scaphoideus species (Table 2) indicate that, except for cox2, PCGs had fewer variable sites between S. maai and S. nigrivalveus than species pairs. Except for nad5, which began with TTG, all other PCGs started with the standard ATN codons, as in other leafhopper mitogenomes (Tambocerus sp., Idioscopus nitidulus) ( Table 3). Most of the PCGs terminated with a TAA stop codon, while atp6 in S. maai, cox1 and cob in S. nigrivalveus end with a TAG codon. Two PCGs (cox2 and nad4) terminated with truncated T stop codons in all three species; nad5 in S. maai and nad1 in S. varius also end with the incomplete codon T (Table 3).
After excluding the stop codons, the relative synonymous codon usage (RSCU) was calculated and summarized in Fig. 2. The total numbers of non-stop codons were 3632, 3633 and 3635 in S. maai, S. nigrivalveus and S. varius respectively. Leucine (Leu), Serine (Ser), Isoleucine (Ile) and Methionine (Met) were the most frequently used amino acids. Within each amino acid codon, third positions ending with A/T were more frequent than those terminated with G/C, causing the highest A + T content to occur in third positions. The codons Arg (CGC) and Ser1 (AGG) were missing in S. maai, and Pro (CCG) and Thr (ACG) were missing in S. nigrivalveus, while S. varius had the full 62 available codons (Fig. 2). Transfer RNA and ribosomal RNA genes. For the 22 typical animal tRNA genes in each Scaphoideus mitogenome, 14 tRNAs were encoded by the J-strand and the remaining eight were located on the N-strand, ranging from 61 to 71 bp in length. TrnM, trnK and trnI showed the highest identical sites percentage when aligned using MAFFT (92.5%, 90.1% and 89.6%, respectively), while trnY presented the lowest similarity (62.5%) ( Table 4). All tRNAs could be folded into the canonical cloverleaf secondary structure except for trnS1 (AGN),    The lengths of the two rRNA genes (rrnL and rrnS) in the Scaphoideus mitogenomes were about 1200 and 740 bp, with the mean A + T contents of 80.2% and 77.9% respectively (Table 1), and both rrn genes were encoded on the N-strand. The large rRNA subunit was located at a conserved position between trnL1 (CUN) and trnV,  Table 3. Comparison of length, start and stop codons of 13 protein coding genes (PCGs) among three Scaphoideus mitogenomes. while the small rRNA subunits was between trnV and the control region ( Fig. 1). The percentage of pairwise identity between the three Scaphoideus species based on the MAFFT alignment is summarized in  Tables S1-S3).
The putative control region, or A + T rich region, located between rrnS and trnI, was the most variable region in the whole mitogenome, with the pairwise identity between Scaphoideus species were relatively low (all < 55%) ( Table 5). The full lengths of CR in three Scaphoideus mitogenomes were 847, 902 and 762 bp respectively, comparable to other sequenced leafhoppers (from 399 bp in N. cincticeps to 1581 bp in Tambocerus  Phylogenetic relationships. In this study, no saturation was detected for the three candidate nucleotide sequence datasets (P123, P123DEGEN and P123R) prepared for ML and BI analyses (all Iss < Iss.cSym or Iss. cAsym, P < 0.05) ( Table 6), suggesting that the concatenated data were suitable for phylogenetic analysis. Thus, we analyzed these three different datasets, in addition to the corresponding amino acid sequence data, to evaluate the tree topology and nodal support. The phylogenetic tree topologies yielded by all six analyses of nucleotide sequence data were identical to each other (Fig. 4). A separate analysis of amino acid sequence data yielded a topology that was identical except for two deep internal nodes within Membracoidea that received only low to moderate ML bootstrap support in all analyses (Fig. S5). The lower overall branch support for the tree resulting from analysis of amino acid sequences indicates that the nucleotide sequence data contain phylogenetic signal (e.g., in third codon positions) that is lost when the data are translated to amino acids.
Monophyly of the three cicadomorphan superfamilies Cicadoidea, Cercopoidea and Membracoidea was strongly supported in all analyses (Bootstrap support values (BS) = 100, Bayesian posterior Probability (PP) = 1.00). All nodes received high support (PP > 0.96) in Bayesian analyses of nucleotide sequence data but a few nodes received only moderate or low support ML analyses of some datasets (BS < 75). Within Cicadellidae, the six Deltocephalinae species constituted one clade and formed a sister group to other leafhoppers. Within  Table 4. Identical Sites and its percentage of each tRNA gene alignments.

Discussion
Consistent with previous observations of Membracoidea, mitogenome sequences of Scaphoideus were highly conserved in gene content, gene size, gene order, base composition, codon usage of PCGs and tRNA secondary structures. Variation in the length of mt genomes is mostly due to length variation in the control region, which ranges from 70 bp (Orthoptera: Ruspoliadubia) 16 to 4.6 kb (Diptera: Drosophila melanogaster) in insects 17 . The sizes of the A + T-rich region in S. maai, S. nigrivalveus and S. varius are consistent with those of the A + T-rich regions of other leafhoppers, which range from 399 bp in N. cincticeps to 1581 bp in Tambocerus sp., but are relatively short compared to most other insect mitogenomes. Moreover, the three newly sequenced species share 32.3% sequence identity in this region. Phylogenetic analyses suggest that the S. maai group is more closely related to the S. nigrivalveus group than to the S. varius group, consistent with the pairwise differences in both PCGs and rRNA genes. While Scaphoideini was found to be relatively close to Drabescini in the phylogeny of Deltocephalinae based on combined morphological, 28 S and Histone H3 data 7 , our analysis placed a species of the poorly defined and polyphyletic tribe Athysanini (Tambocerus, not included in Zahniser and Dietrich's dataset) as sister to Scaphoideus. In recent phylogenetic studies of the entire subfamily Deltocephalinae based on partial gene sequences and morphological characters, Scaphoideini was resolved as paraphyletic with respect to Drabescini with low branch support. Further phylogenetic studies are needed to elucidate the status and relationships within this group. Although our taxon sample includes only a small fraction of the diversity of leafhoppers, a group comprising >21,000 known species, our phylogenetic analyses of mitogenome sequences indicate that such data are able to resolve relationships at various levels in the taxonomic hierarchy of this insect group. Thus, addition of taxa to our leafhopper mitogenome dataset may help improve resolution of the still poorly understood relationships among major leafhopper lineages. Interestingly, as in our analysis of mitogenome data, a recent larger-scale phylogenomic study of Membracoidea that included nucleotide sequence data from 388 gene regions was also unable to resolve some deep internal branches of the phylogeny of this group 18 . This suggests that some ancient divergences among leafhoppers may be difficult to resolve, even after the addition of much more data.
Mitochondrial genes have been widely used as genetic markers, especially with cox1 partial sequences gaining widespread popularity as convenient DNA barcodes for species identification 19 . Our sliding window analysis indicated that cox1 is one of the most conserved protein-coding genes of the mitogenome and the two rRNA genes are also highly conserved when compared to other regions. Similar results were found in recent studies of Psylloidea and Psychodidae species 20,21 . On the other hand, mitogenome sequence data are sufficiently variable to also have potential use in single nucleotide polymorphism (SNP) studies.
Partial mitogenome sequence data were recently used in a population genetic study of Scaphoideus. Papura et al.    an attempt to elucidate the colonization scenario of the phytoplasma vector species S. titanus, which is native to northeastern North America but is now well established and spreading in Europe 22 . Consistent with microsatellite data, the mitochondrial data indicated much higher haplotype diversity among the native North American populations than observed in European samples, which displayed low levels of genetic diversity and no isolation by distance 22 . The three most variable coding regions in the mitogenomes of Scaphoideus species sequenced for our study were atp8 (Pi = 0.27), nad6 (Pi = 0.24) and nad2 (Pi = 0.23), while cox2 (Pi = 0.17) was relatively conserved and slightly below the mean Pi value for PCGs (0.20). This suggests that the first three mentioned markers may be more useful than cox2 for future population genetic studies. Although atp8 (153 bp) may be too short to be very informative, nad6 (477 bp) and nad2 (975 bp) should be considered for use in future Scaphoideus population genetic studies.

Materials and Methods
Sample collection and DNA extraction. All species used in this study were collected in China between 2011 and 2015 (Table S4). Fresh specimens were captured and preserved in 100% ethanol, and stored at −20 °C in the laboratory. After morphological identification, voucher specimens with male genitalia prepared were deposited in the Entomological Museum of Northwest A&F University, and total genomic DNA was extracted from muscle tissues of the thorax using the DNeasy DNA Extraction kit (Qiagen).
Next generation sequence assembly. Most  Genome annotation and bioinformatic analyses. All the three Scaphoideus mitochondrial genomes were annotated with GENEIOUS R8 (Biomatters Ltd., Auckland, New Zealand). All 13 protein-coding genes and two rRNA genes were identified by comparison with the homologous sequences of other leafhoppers from

Phylogenetic analyses. Taxa selection.
A dataset consisting of the three newly sequenced taxa and 10 previously available Membracoidea mitogenomes (3 treehoppers and 10 leafhoppers), plus outgroups consisting of three species of Cicadoidea and five species of Cercopoidea was compiled for phylogenetic analysis (Table 7).
Sequence alignment and substitution saturation test. Each of the 13 PCGs and two rRNA genes was aligned separately based on the invertebrate mitochondrial genetic code using the MAFFT algorithm in the TranslatorX online server (http://translatorx.co.uk/) 29 , with poorly aligned sites removed from the protein alignment before back-translating to nucleotides using GBlocks under default settings. Each of the two rRNAs was aligned using the Q-INS-i method through MAFFT version 7 alignment server (http://mafft.cbrc.jp/alignment/server/) 30 . Potential substitution saturation of each dataset was assessed using the index of substitution saturation (Iss) of Xia et al. 31 implemented in the DAMBE 5 32 .
Dataset concatenation, partitioning and substitution model selection. Alignments of individual genes were concatenated using SequenceMatrix 1.7.8 33 . To compensate for nucleotide compositional heterogeneity between taxa, the "Degen" approach was selected to make synonymous changes largely invisible but leaving the inference of non-synonymous change largely intact, as implemented in the online server (http://www.phylotools.com/) 34,35 . To assess the stability of phylogenetic results under strategies for reducing noise in the data, fourdatasets were generated: 1) P123: 13 PCGs with 9912 nucleotides; 2) P123DEGEN: 13 PCGs with all coding positions with 9912 nucleotides; 3) P123R: 13 PCGs and two rRNAs with 12047 nucleotides; 4) AA: amino acid sequences of 13 PCGs with 3304 amino acids. PartitionFinder v1.1.1 36 was employed to infer the optimal nucleotide substitution models and partition strategies. Data blocks for PCGs were defined by codon positions. The Bayesian information criterion (BIC) was chosen as the metric for the partitioning scheme under the "greedy" search algorithm. Details of the best-fit schemes calculated for each dataset are shown in Supplementary Table S5.
Phylogenetic inference. ML analyses were implemented using raxmlGUI 1.5 37 under the GTRGAMMAI model, and support for nodes was assessed by performing 1000 rapid bootstrap replicates (BS). Bayesian analyses were conducted using MrBayes 3.2.6 38 . Following the partition schemes suggested by PartitionFinder, all model parameters were set as unlinked across partitions. Two simultaneous runs with four independent Markov chains were  Table 7. List of mitochondrial genomes used for the phylogenetic analysisin this study.
SCiEntifiC RePORtS | 7: 16908 | DOI:10.1038/s41598-017-17145-z run for seven million generations and trees were sampled every 1000th generation. After the average standard deviation of split frequencies fell below 0.01, the first 25% of samples were discarded as burn-in and the remaining trees were used to generate a consensus tree and calculate the posterior probabilities (PP).